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PREFACE TO THE FIRST EDITION 


Tuptheory of factorialanalysisismathematicalinnature, but 
this book has been written so that it can, itis hoped, beread by 
those who have no mathematics beyond the usual secondary 
school knowledge. Readers are, however, urged to repeat 
some at least of the arithmetical calculations for themselves. 
It is probable that the subject-matter of this book may 
seem to teachers and administrators to be far removed from 
contact with the actual work of schools. I would like 
therefore to explain that the incentive to the study of 
factorial analysis comes in my case very largely from the 
practical desire to improve the selection of children for 
higher education. When I was thirteen years of age and 
finishing an elementary school education, I won a “‘ scholar- 
ship ” to a secondary school in the neighbouring town, one 
of the early precursors of the present-day “ free places ” 
in England. I have ever since then been greatly impressed 
by the influence that event has had on my life, and have 
spent a great deal of time in endeavouring to improve the 
methods of selecting pupils at that-stage and in lessening 
the part played by chance. It was inevitable that I should 
be led to inquire into the use of intelligence tests for this 
purpose, and inevitable in due course that the possibilities 
of factorial analysis should also come under consideration. 
It seemed to me that before any practical use could be 
made of factorial analysis a very thoroughgoing examina- 
tion of its mathematical foundations was necessary. The 
present book is my attempt at this.... It may seem remote 
from school problems. But much mathematical study and 
many calculations have to precede every improvement 
in engineering, and it will not be otherwise in the future 

with the social as well as with the physical sciences. 

Goprrey H. THOMSON 
Moray HOUSE, 
UNIVERSITY OF EDINBURGH, 
November 1938 


PREFACE TO THE FIFTH EDITION 


In earlier editions since the first, the chief changes in, the 
second edition were that the original chapter on Simple 
Structure was expanded into three, to cover oblique 
factors and second-order factors, while Dr. D. N. Lawley 
provided a chapter on factor analysis by maximum like- 
lihood, and a corresponding section in the mathematical 
appendix. The main changes in the third edition con- 
cerned the identity of simple structure factors after 
univariate selection, and the relations between two sets of 
variates. In the fourth, the principal addition was of 
Lawley’s formule for the standard errors of individual 
residuals, and of factor loadings, when maximum likelihood 
methods have been used. 

In the present (the fifth) edition it has for the first time 
been possible to reset the whole book. This has permitted 
more extensive alterations to be made, and the oppor- 
tunity has been taken of rearranging the order of the chap- 
ters and recasting several of them, as well as inserting in 
their proper places in the text those pages which in former 
editions had to be added as appendices. Chapters V, 
VIII, and X will supply the minimum of technique, and the 
remainder of Parts II and III will give in addition a descrip- 
tion of the methods of analysis using principal components, 
using the principle of maximum likelihood, or using 
Thurstone’s Simple Structure. 

I hope, however, that readers will not merely use the book 
as a set of recipes on how to carry out certain computations, 
but will study the geometrical explanations (twelve new 
diagrams have been added): and especially that they will 
ponder the implications of the two chapters, XVIII and 
XIX, on the influence of selection on factors, and the final 
two chapters on the sampling theory and certain funda- 
mental questions. 

Goprrey H. THOMSON 

UNIVERSIYY OF EDINBURGH, 

April 1951 
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All science starts with hypotheses—in other words, 
with assumptions that are unproved, while they may be, 
and often are, erroneous; but which are better than 
nothing to the searcher after order in the maze of pheno- 
mena. 


T H. HuxLey 


I am not insensible of the advantage which accrues to 
Applied Mathematics from the co-operation of the Pure 
Mathematician, and this co-operation is not infrequently 
called forth by the very imperfections of writers on Applied 
Mathematics. 

R. A. FISHER 
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THE TWO-FACTOR THEORY AND ITS 
EXTENSIONS 
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° CHAPTER I 
THE THEORY OF TWO FACTORS 


1. Factor tests.—The object of this book is to give some 
account of the “ factorial analysis ”’ of ability, as it is 
called. In actual practice at the present day this science 
is endeavouring (with what hope of success is a matter of 
keen controversy) to arrive at an analysis of mind based 
on. the mathematical treatment of experimental data 
obtained from tests of intelligence and of other qualities, 
and to improve vocational and scholastic advice and 
prediction by making use of this analysis in individual 
cases. It is a development of the “ testing ” movement— 
the movement in which experimenters endeavour to devise 
tests of intelligence and other qualities in the hope of 
sorting mankind, and especially children, into different 
categories for various practical purposes ; educational (as 
in directing children into the school courses for which they 
are best suited); administrative (as in deciding that some 
persons are so weak-minded as to need lifelong institutional 
care); or vocational, ete. 

There are many psychologists who would deny that from 
the scores in such tests, or indeed from any analysis, we 
can (ever) return to a full picture of the individual; and 
without entering into any discussion of the fundamental 
controversy which this denial reveals, everyone who has 
had anything to do with tests will readily agree that this 
is certainly so at present in practice. But the tester may 
be allowed to try to make his modest diagram of the 
individual better, more useful, and if possible simpler. 

Now, the broadest fact about the results of “ tests” of 
all sorts, when a large number of them is given to a large 
number of people, is that every individual and every test 
is different from every other, and yet that there are certain 
rather vague similarities which run through groups of 
people or groups of tests, not very well marked off from 
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one another but merging imperceptibly into neighbouring 
groups at their margins. To describe an individual ac- 
curately and completely one would have, to administer to 
him all the thousand and one tests which have been or 
may be devised, and record his score in each, an impossible 
plan to carry out, and an unwieldy record to use even if 
obtained. Both practical necessity and the desire for 
theoretical simplification lead one to seek for a few tests 
which will describe the individual with sufficient accuracy, 
and possibly with complete accuracy if the right tests can 
be found. If, as has been said, there is some tendency 
for the tests to fall into groups, perhaps one test from each 
group may suffice. Such a set of tests might then be said 
to measure the “ factors ” of the mind. 

2. Fictitious factors.—Actually the progress of the 
“ factorial ? movement has been rather different, and the 
factors are not real but as it were fictitious tests which 
represent certain aspects of the whole mind. But con- 
ceivably it might have taken the more concrete form. In 
that case the “ factor tests ” finally decided upon (by 
whom, the reader will ask, and when “ finally ” ?) would 
be a set of standards which, like any other standards, would 
have to be kept inviolate, and unchanged except at rare 
intervals and for good reasons. Some tendency towards 
this there has been. The Binet scale of tests is almost an 
international standard, and there is a general agreement 
that it must not be changed except by certain people upon 
whose shoulders Binet’s mantle has fallen, and only seldom 
and as little as possible even by them. But the Binet 
scale is a very complex entity, and rather represents many 
groups of tests than any one test. By “ factor tests ” one 
would more naturally mean tests of a “ pure” nature, 
differing widely from one another so as to cover the whole 
personality adequately. And since actual tests always 
are more or less mixed, it is understandable why “ factors ” 
have come to be fictitious, not real, tests, to be each 
approximated to by various combinations of real tests so 
weighted that their unwanted aspects tend to cancel out, 
and their desired aspects to reinforce one another, the team 
approximating to a measure of the pure “ factor.” 
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But how, the reader will ask, do we know a “ pure ” 
factor, how are we to tell when the actual tests approximate 
to it? To give a preliminary answer to that question we 
must, go back to the pioneer work of Professor Charles 
Spearman in the early years of this century (Spearman, 
1904). The main idea which still, rightly or wrongly, 
dominates factorial analysis was enunciated then by him, 
and practically all that has been done since has been either 
inspired or provoked by his writings. His discovery was 
that the “ coefficients of correlation ° between tests tend 
to fall into “ hierarchical order,” and he saw that this 
could be explained by his famous “ Theory of Two Factors.” 
These technical terms we must now explain. 

8. Hierarchical order—A coefficient of correlation is a 
number which indicates the degree of resemblance between 
two sets of marks or scores. Ifa schoolmaster, for example, 
gives two examination papers to his class, say (1) in arith- 
metic and (2) in grammar, he will have two marks for every 


. boy in the class. If the two sets of marks are identical 


the correlation is perfect, and the correlation coefficient, 
denoted by the symbol 745, is said to be + 1. If by some 
curious chance the one list of marks is exactly like the 
other one upside down (the best boy at arithmetic being 
worst at grammar, and so on), the correlation is still perfect, 
but negative, and 7, = — 1. If there is absolutely no 
resemblance between the two lists, 7, = 0. If there is a 
strong resemblance, but falling short of identity, 7, may 
equal -9; and so on. There is a method (the Bravais- 
Pearson) of calculating such coefficients, given the list of 
marks.* “ Tests ” can obviously be correlated just like 
* The “ product-moment formula ” is— 
sum (i) 
4/{sum (a?) X sum (a?) 
where a, and @, are the scores in the two tests, measured from the 


average (so that approximately half the scores are negative), and 
the sums are over the persons to whom the scores apply. The 


Tiz 


quantity— 
sum (2,) e 


2 


o: PrE Saan e T 
z number of persons 


is called the variance of Test 1, and o its standard deviation. If the 
scores in each test are not only measured from their average, but 
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examinations, and a convenient form in which to write 
down the intercorrelations of a number of tests is in a 
square chequer board with the names of the tests (say 
a, b,c . .) written along the two margins, thus : 


a b c d é J 
a 48 24 54 42 "80 
b 48 . 82 T2 56 40 
c "24 "832 . -36 28 20 
d 54 2 36 . 63 AS 
e "42 -56 "28 -63 . 835 
fi “30 “40 "20 45 BS 


Totals| 198 248 1-40 2-70 2-24 1-70 


It was early found that such correlations tend to be 
positive, and it is of some interest to see which of a number 
of tests correlates most with the others. This can be found 
by adding up the columns of the chequer board, when we 
see in the above example that the column referring to 
Test d has the highest total (2:70). The tests can then be 
rearranged and numbered in the order of these totals, thus : 


1 2 3 4 5 6 

| d b e a N c 
1d F 72° -63 -54 45 -36 
2 b | “72 i -56 -48 -40 -82 
3 e -63 -56 * 42 “BD 28 
4 a 54 48 “42 , “30 -24 
5 f 45 -40 35 -80 5 +20 
6 c | °36 -382 -28 -24 -20 À 


After the tests have been thus arranged, the tendency 
which Professor Spearman was the first to notice, and which 


are then divided through by their standard deviation, they are said 
to be standardized, and we represent them by z% and z» About 
two-thirds of them, then, lie between plus and minus one. With 
such scores Pearson’s formula becomes— 
sum of the products 2,2, 

number of persons p 

In theoretical work, an even larger unit is used, namely on/p. 
With these units, the sum of the squares is unity, and the sum of the 
products is the correlation coefficient. The scores are then said to 
be normalized, but note that this does not mean distributed in a 
“ normal ” or Gaussian manner, 


Tiz 
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he called “ hierarchical order,’ is more easily seen. It is 
the tendency for the coefficients in any two columns to have 
a constant ratio throughout the column. Thus in our 
example, if we fix our attention on Columns a and f, say, 
they run (omitting the coefficients which have no partners) 
thus : 


54 -45 
48 40 
42 BS 
24 20 


and every number on the right is five-sixths of its partner 
on the left. 

Our example is a fictitious one, and the tendency to 
hierarchical order in it has been made perfect in order to 
emphasize the point. It must not be supposed that the 
tendency is as clear in actual experimental data. Indeed, 
at the time there were some who denied altogether the 
existence of any such tendency in actual data. Those who 
did so were, however, mistaken, although the tendency is 
not as strong as Professor Spearman would seem originally 
to have thought (Spearman and Hart, 1912). The follow- 
ing is a small portion of an actual table of correlation coeffi- 
cients* from those days (Brown, 1910, 309). (Complete 
tables must, of course, include many more tests ; in recent 
work as many as 57 in one table.) 


} 1 2 3 4 5 6 
are 78 -45 27 -59 -30 
2 | 78 < 48 28 “Sl +24 
8 | 45 48 z 52 40 -38 
4 | 27 28 -52 ` -41 -38 
5 | 59 -Bl 40 41 r 13 
6 | 80 24 88 88 18 


* In this, as in other instances where data for small examples are 
taken from experimental papers, neither criticism nor comment is 
in any way intended. Illustrations are restricted to few tests for 
economy of space and clearness of exposition, but in the experiments 
from which the data are taken many more tests are employed, and 
the purpose may be quite different from that of this book. 
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4. G saturations.—This tendency to“ hierarchical order” 
was explained by Professor Spearman by the hypothesis 
that all the correlations were due to one “ factor’ only, 
present in every test, but present in largest amount, in the 
test at the head of the hierarchy. This factor is his famous 
“ g,” to which he gave only this algebraic name to avoid 
making any suggestions as to its nature, although in some 
papers and in The Abilities of Man he permitted himself 
to surmise what that nature might be. Each test had also 
a second factor present in it (but not to be found elsewhere, 
except indeed in very similar varieties of the same test), 
whence the name, “ Theory of Two Factors ”—_yreally one 
general factor, and innumerable second or specific factors. 

It will be proved in the Mathematical Appendix* that 
this arrangement would actually give rise to “ hierarchical 
order.” Meanwhile this can at least be made plausible. 
For if Test d has that column of correlations (the first 
in our table) with the other tests solely because it is 
saturated with so-and-so much g; and if Test b has less g 
in it than d has, it seems likely enough that b’s column of 
correlations will all be smaller in that same proportion. 
We can, moreover, find what these “ saturations ” with g 
are. For on the theory, each of our six tests contains the 
factor g, and another part which has nothing to do with 
causing correlation. Moreover, the higher the test is in 
the hierarchical ranking, the more it is “ saturated ” with g. 
Imagine now a fictitious test which had no specific, a test 
for g and for nothing else, whose saturation with gis 100 per 
cent., or 1:0. This fictitious test would, of course, stand 
at the head of the hierarchy, above our six real tests, and 
its row of correlations with each of those tests (their 
“ saturations ”) would each be larger than any other in the 
same column. What values would these saturations take ? 

Before we answer this, let us direct our attention to the 
diagonal cells of the “ matrix” of correlations (as it is 
called—a matrix is just a square or oblong set of numbers), 
cells which we have up to the present left blank. Since 
each number in our matrix represents the correlation of the 
two tests in whose column and row it stands, there should 

* Para. 3: and see also Chapter xviii, end of Section 6, page 283, 


Po 


THE THEORY OF TWO FACTORS 9 


egal 2 3 4 5 6 
AEE e E E eT 

— 72 63 -54 45 “36 
Bal ores : 56 -48 -40 32 
S M bse 56 42 -35 28 
4 re 48 2 3 “30 24 
5 tes “40 “35 “30 0 +20 
6 re -32 -28 -24 -20 


be inserted in each diagonal cell the number unity, repre: 
senting the correlation of a test with its own identical self. 
In these self-correlations, however, the specific factor of 
each test, of course, plays its part. These self-correlations 
of unity are the only correlations in the whole table in 
which specifies do play any part. These “ unities,” there- 
fore, do not conform to the hierarchical rule of propor- 
tionality between the columns. 

But the case is different with the fictitious test of pure g 
It has no specific, and its self-correlation of unity should 
conform to the hierarchy. If, therefore, we call the 
“saturations ” of the other tests Tig, tays Tegs Tags Tsg ANA Toys 
we see that we must have, as we come down the first two 
columns within the matrix— 

Ty "72 -68 -54 -45 36 


1 Yo Ta Tag fey Tog 
and similar equations for each other column with the g 
column, which together indicate that the six “ saturations ” 


ie 9 8 TT 6 5B 4 
Furthermore, each correlation in the table is the product 
of two of these saturations. 'Thus— 
72=-9 X 8 
42 =-7 X 6 
T34 = Tog X Tag 
The six tests can now be expressed in the form of 
equations : 


zı = '9g + -436s, 
Za = ‘8g + -600s, 
Z3 = "7g + -7145, 
Z4 = ‘6g + -800s, 
Z5 = ‘5g + -866s, 
& = 4g + -917s, 


F.A.—1* 
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Herein, each z represents the score of some person in the 
test indicated by the subscript, a score made up of that 
person’s g and specific in the proportions indicated by the 
coefficients. The scores are supposed measured from the 
average of all persons, being reckoned plus if above the 
average and minus if below; and so too are the factors g 
and the specifies. And each of them, tests and factors, is 
“ standardized,” i.e. measured in such units that the sum 
of the squares of all the scores equals the number of 
persons. This is achieved by dividing the raw scores by the 
“standard deviation.” The saturations of the specifics 
are such that the sum of the squares of both saturations 
comes in each test to unity, the whole variance of that test. 


Thus— 486 = /(1 — -9%) 


5. A weighted battery—This brief outline of the Theory 
of Two Factors must for the moment suffice. It is 
enough to enable the question to be answered which at the 
end of our Section 2 led to the digression. ‘‘ How,” the 
reader asked, ‘‘ do we know a pure factor, how are we to 
tell when the actual tests approximate to it?” In the 
Two-factor Theory the important pure factor was g itself, 
and a test approximated to it the more, the higher it stood 
in the hierarchy. Its accuracy of measurement of g was 
indicated by its “saturation.” And a battery of hier- 
archical tests could be weighted so as to have a combined 
saturation higher than that of any one member, each test 
for this purpose being weighted (as will be shown in Chapter 


i Ti : 

XV) by a number proportional to Tee where ry is the 
= Ty? 

g saturation of Test i (Abilities, p. xix). The battery 

saturation or multiple correlation with g is then— 


a S 
145 
Tia 
where S = Y—*_— 
ILE 
Although remained a fiction, yet a complex test, made up 


of a weighted battery of tests which were hierarchical, 
could approach nearer and nearer to measuring it exactly, 
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as more tests were added to the hierarchy. Each test added 
would have to conform to the rule of proportionality in its 
correlations with,the pre-existing battery. If it did not 
do so it would have to be rejected. The battery at any 
stage would form a kind of definition of g, which it ap- 
proached although never reached. Anda man’s weighted 
score in such a battery would be an estimate of his amount 
of g, his general intelligence. The factorial description of 
a man was at this period confined to one factor, since the 
specific factors were useless as description of any man. 
For one thing, they were innumerable ; and for another, 
being specific, they were only able to indicate how the man 
would perform in the very tests in which, as a matter of 
fact, we knew exactly how he had performed. 

6. Oval diagrams.—It is convenient at this point to 
introduce a diagrammatic illus- 
tration which will be useful in the ZA 
less technical part of this book, ZZ 
although like all illustrations i A 
must be taken only as such, and the 
analogy must not be pushed too far. 
If we represent the two abilities, 


which are measured by tests, by as) 
two overlapping ovals as in FE, 
Figure 1, then the amount of the 
overlap can be made to represent 
the degree to which these tests are I. 


correlated. If we call the whole 
area of each oval the “ variance ” 


2 


A 
of that ability, we shall be intro- A LA A 
ducing the reader to another pase lZ 
technical term (of which a de- aes A 


finition was given in the footnote 
to page 5). Here it need mean 
nothing more than the whole 
“amount” of the ability. The 
overlap we shall call the “covariance.” If the two 
variances are each equal to unity, then the covariance is 
the correlation coefficient. To make the diagram quantita- 
tive, we can indicate in figures the contents of each part of 


4. 
Figure 3. 
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the variance, as in the instance shown, which gives a 
correlation of 3%, or -6. If the separate parts of each 
variance (i.e. of cach oval) do not add, up to the same 
quantity, but to vı and v» say, then the covariance (the 
amount in the overlap) must be divided by 1/222 in order 
to give the correlation. Thus, Figure 2 represents a 
correlation of 3 + /(4 x 9) =-5. No attempt is made 
in the diagrams to make the actual areas proportional to the 
parts of the variance, it is the numbers written in each cell 
which matter. 

The four abilities represented by four tests can clearly 
overlap in a complicated way, as in Figure 3, which shows 
one part of the variance (marked g) common to all four of 
the tests ; four parts (left unshaded} each common to three 
tests ; six parts (shaded) each common to two tests ; and 
four outer parts (marked s) cach specific to one test only. 
The early Theory of Two Factors adopted the hypothesis 
that, except for very similar varieties of the one test, none 
of the cells of such a diagram had any contents save those 
marked g and s, the general and the specific factors. The 

“variance ” of each ability was in that theory completely 
accounted for by the variance due to g, and the variance 
due to s. 

T. Tetrad-differences—In Section 3 it was explained that 
the discovery made by Professor Spearman was that the 
correlation coefficients in two columns tend to be in the 
same ratio as we go up and down the pair of columns. 
That is to say, if we take the columns belonging to Tests 


b and f, and fix our attention on the correlations which 
b and f make with d and e, we have: 


\ Ss 

d “72 45 

e 56 35 
where 72 -56 
45 SE 


è 
This may be written— 


‘T2 X -85 — -45 x -56 = 0 
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and in this form is called a ‘‘ tetrad-difference.”” In 
symbols this one is— 

Ta e Taste = Tala = 0 
Spearman’s discovery may therefore be put thus: “ The 
ə -tetrad-differences are, or tend to be, zero.” It is clear that 
| this will be so if, as we said was the case in the Theory of 
q Two Factors, each correlation is the product of two cor- 
| relations with g. For then the above tetrad-difference 

becomes— ` 


| Tag tg ea" fo — Tag” fg eg"bg 
| which is identically zero. The present-day test for hier- 
archical order in a correlation matrix is to calculate all the 
tetrad-differences (always avoiding the main diagonal) and 
see if they are sufficiently small. If they are, then the 
j correlations can be explained by a diagram of the same 
nature as Figure 3, by one general factor and specifies. It 
p. is, of course, not to be expected in actual experimenting 
that the tetrad-differences will be ewactly zero ; no experi- 
ment on human material can be as accurate as that. What 
is required is that they shall be clustered round zero in a 
narrow curve, falling off steadily in frequency as zero is 
departed from. The number of tetrad-differences increases 
very rapidly as the number of tests grows, and in an actual 
experimental battery the tetrads are very numerous indeed. 
In the small portion of a real correlation table given above 
(page 7), with six tests, there are 45 tetrad-differences,* 
and in this instance they are distributed as follows (taking 
absolute values only and disregarding signs, which can be 


changed by altering the order of the tests) : 
From -0000 to :0999, 28 tetrad-differences. 
From +1000 to -1999, 18 tetrad-differences. 
From :2000 to -2796, 4 tetrad-differences. 


This distribution of tetrads can be represented by a 
“ histogram ” like that shown in Figure 4, which explains 
itself. It is clear that some criterion is required by which 
we can know whether the distribution of tetrad-differences, 
after they have been calculated, is narrow enough to justify 
us in assuming the Theory of Two Factors. This criterion 


* Not all independent. 
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is explained in Chapter III, page 41. One form of it con- 
sists in drawing a distribution curve to which, on grounds 
of sampling, the tetrad-differences may be expected to con- 
form. Any tetrad-differences which scem to be too, large 
to be accounted for by the Theory of Two Factors are then 
examined, to see whether the tests giving them have any 
special points of resemblance, 
in content, method, or other- 
wise, which may explain why 
they disturb the hierarchy. 

8. Group factors—As_ time 
went on it became clear that 
the tendency to zero tetrad- 
differences, though strong, was 
not universal enough to permit 

Sane lO I-29 <3 an explanation of all correla- ' 

Figure 4, tions between tests in terms of 

: g and specifics, with a few 
slight “ disturbers ” in the form of slightly overlapping 
specifics. It became necessary to call in group factors, 
which run through many though not through all tests, 
to explain the deviations from strict hierarchical order. 
The Spearman school of experimenters, however, tend 
always to explain as much as possible by one central 
EE ERAN TAA EN only when necessitated. 
it were establish its ri a = Bea ae raust, g 

roof is on him os SUR eng eee ee S 
artificial illustration, a matrix tae oe ef ee 

> > relation coefficients : 


Lo Oye eg 


1 5 ot) 5 5 
2 5 3 +8 T 
3 5 8 3 Bi 
4 Os) "5 5 
Hes be examined, and its three tetrad-differences found 
er? 
Zero 
AS 


15 
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Inspection shows that the correlation 73 is the cause of 
the discrepancies from zero, and the experimenter trained 
in the Two-factor school would therefore explain these 
correlations by a central factor running through them all, 
plus a special link joining Tests 2 and 3, as in Figure 5. 

There are innumerable other possible ways of explaining 
these same correlations. For 
example, the linkages between 
the tests might be as in Figure 6, 
which gives exactly the same cor- 
relations. This lack of unique- 
ness is something which must 
always be borne in mind in study- 
ing factorial analysis. There are 
always, as here, innumerable 
possible analyses, and the final 
decision between them has to be 
made on some other grounds. 
The decision may be psycho- 
logical, as when for example in 
the above case an experimenter 
chooses one of the possible dia- 
grams because it best agrees with 
his psychological ideas about the 
tests. Or the decision may be 
made on the ground that we 
should be parsimonious in our 
invention of “ factors,” and that 
where one general and one group factor will serve we should 
not invent five group factors as required by Figure 6. 
Both diagrams, however, fit the correlational facts exactly, 
and so also would hundreds of other diagrams which might 
be made. As has been said, the two-factor tendency is to 
take the diagram with the largest general factor (and the 
largest specifics also) and with as few group factors as 
possible. 

9. The verbal factor—In this way the Theory of Two 
Factors has gradually extended the “ two » to include, in 
addition to g and specifics, a number of other group factors, 
still, however, comparatively few. These group factors 


Figure 6. 
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bear such names as the verbal factor v, a mechanical factor 
m, an arithmetic factor, perseveration, ete. The charac- 
teristic method of the Two-factor school. can be well 
seen, without any technical difficulties unduly obsguring 
the situation, in the search for a verbal factor. The idea 
that, in addition to a man’s g (which is generally thought 
of as something innate) there may be an acquired factor 
of verbal facility which enables him to do well in certain 
tests, is a not unnatural one. A battery of tests.can be 
assembled of which half do, and half do not, employ words 
in their construction or solution. The correlation matrix 
will then have four quadrants, the quadrant V containing 
the correlations of the verbal tests among themselves, the 


(6; 


| 
| 
l 
1 
! 
i 

—-j- 
! 
i 
| 
I 


quadrant P the correlations of the non-verbal or, say, 
pictorial tests, and the quadrants C containing the cross- 
correlations of the one kind of test with the other. If the 
Whole table is sufficiently “ hierarchical,” there is no 
evidence for a group factor v or a group factor p. If 
either of these factors exists, there will be differences to be 


noticed between the six kinds of tetrad which can be 
chosen, namely : 


p e On U foe P 
Ou 3° ul) oat A aT 
p| @ @ 
(1) (2) (3) 
n Oe v @ æ p| @ @ 
ae ty CEER, ih fo 
v i @ . p a v z 
(4) (5) (6) 
Da e es al ret te p a 
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A tetrad like 1, with two verbal tests along one margin 
and two pictorial tests along the other, will be found in 
quadrant C. Neither a factor common to the verbal tests 
only,,nor one common to the pictorial tests only, will add 
anything to any of the four correlations in such a tetrad- 
difference, which may be expected, therefore, to tend to be 
zero. If the tetrads in C seem to do so, the other tetrads 
can be examined. Tetrad 2 ‘is taken wholly from the V 
quadrant. In it the verbal factor, if any is present, will 
reinforce all the four correlations, and should not therefore 
disturb very much the tendency to a zero tetrad-difference. 
(Reinforced correlations are marked by @ in the diagrams.) 
The same is true of Tetrad 3 taken wholly from the P 
quadrant. Tetrads 4 and 5 have each two of their cor- 
relations reinforced, by the v factor in 4 and by the p 
factor in 5, but in each case in such a way as not to change 
very much the tetrad-difference. It is when we come to 
tetrads like 6, which have one correlation in each of the 
four quadrants, that the presence of either or both factors 
should show itself strongly : for the two reinforced correla- 
tions here occur on a diagonal, and inflate only the one 
member of the tetrad-difference— 

Tesh pp Yoyo 

If, then, a verbal factor, and also a pictorial factor, are 
present, the tendency for the tetrad-differences to vanish 
should become less and less strong as we consider tetrads 
of the kinds 1, 2 and 3, 4 and 5, and especially 6, where 
the tetrad-differences should leap up. If only the verbal 
factor is present, tetrad-differences of the kind 3 should 
vanish rather more than those of the kind 2. But it will 
not be easy to distinguish between either suspected factor, 
and both. Tetrads like 6, however, should give conclusive 
evidence of the presence of one or the other, if not both. 
Methods like this were employed by Miss Davey (Davey, 
1926), who found a group factor, but not one running 
through all the verbal tests, and by Dr. Stephenson 
(Stephenson, 1931), whose results indicated the, presence 
of a verbal factor.* : 

* T. L. Kelley had already found by other methods strong evidence 
of a verbal factor (Kelley, 1928, 104, 121 et passim). 
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10. Growp-factor saturations—Just as the g saturations 
of tests can be calculated, so also can the saturation of a 
test with any group factor it may contain. The general 
method of the Two-factor school is first to work, with 
batteries of tests which give no unduly large tetrad- 
differences, and which also appear to satisfy one’s general 
impression that they test intelligence. From such a 
battery, of which the best example is that of Brown and 
Stephenson (B. and S., 1933), the g saturations can be 
calculated.* Each test has, however, also its specific, 
which, so long as it is in the hierarchical battery, is unique to it 
and shared with no other me-nber of the battery. A test 
may now be associated with some other battery of different 
tests, and with some of these it may share a part of its 
former specific, as a group factor which will increase its 
correlation beyond that caused by g. The excess correla- 
tion enables the saturation of the test with this group 
factor to be found—the details are too technical for this 
chapter—and the specific saturation correspondingly 
reduced. Finally, the tester may be able to give the 
composition of a test as, let us say (to invent an example) — 

‘T1g + -40v + -34n + -475 
where g is Spearman’s 8, v is Stephenson’s verbal factor, 
n is a number factor, and s is the remaining specific of the 
test. The coefficients are the “ saturations ” of the test 
with each of these; that is, the correlations believed to exist 
between the test and these fictitious tests called factors. 
The squares of these saturations represent the fractions of 


the test-variance contributed by each factor, and these 
squares sum to unity, thus : 


Saturation Squared 
- 5041 
-1600 
“1156 
-2209 


1-0006 


the text here rather oversimplifies the 
Brown and Stephenson contains in fact 
as well as g and specifics. 


%® 3 G% 


* For the sake of clarity 
situation. The battery of 
a rather large group factor 


CHAPTER II 
BIFACTOR ANALYSIS AND CLUSTERS 


1. The bifactor method.—Holzinger’s Bifactor Method 
(Holzinger, 1935, 1937a) may be looked upon as another 
natural extension of the simple Two-factor plan of analysis. 
It endeavours to analyse a battery of tests into one general 
factor and a number of mutually exclusive group factors. 
A diagram of such an analysis looks like a “ hollow stair- 
casc,” thus : 


Test g h k l 
x x 
2 x x 
3 x x 
4 x x 
5 x x 
6 x x 
7 x x 
8 x x 
9 x x 


Here factor g runs through all, as is indicated by the 
column of crosses. Factors h, k, and l run through mutu- 
ally exclusive groups of tests each. The saturations with 
g can be calculated from sub-batteries of tests which form 
perfect hierarchies, by selecting only one test from each 
group (in every possible way). After these are known, 
the correlation due to g can be removed, and then the 
saturations due to each group factor found. 

The following artificial example will illustrate some of 
the points of this method. Consider these correlations, 
which to save space are printed without their decimal 
points : 

19 
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Le Cee lee oi age 88 OF 10 Id “12 
1 57 40 45 63 63 20 28.74 52 45 34 
2 57 34 25 53 39 I7 44 68 48 39 56 
3 40 34 18 57 27 59 16 44 70 73 20 
4 45 25 18 ZSO 12 82 29 -20 IS 
5 63 53 57 27 42 40 26 68 67 G3 31 
6 SSeS ISON 27 51 42 13 18 50 34 30 23 
vf 207 17 59-09 40 18 08 22 GO 64 10 
8 28 44 16 12 26 18 08 85 21 18 43 
9 74 68 44 32 68 50 22 35 56 50 44 
10 52 43 70 22 67 84 60 21 56 78 25 
ihl 45 89 73 20 63 30 64 18 50 78 23 
12 84 56 20 15 81 23 10 43 44 25 23 


There are two stages in a bifactor analysis. The first 
problem is to decide how to group the tests so that those 
are brought together which share a second or group factor. 
` Then the best method of calculating is needed to find the 
loadings. 

The grouping can partly be done subjectively by con- 
sidering the nature of each test and putting together 
memory tests, or tests involving number, and so on. 
Holzinger uses a “ coefficient of belonging,” B, to determine 
the coherence of a group. B is equal to the average of the 
intercorrelations of the group divided by their average 
correlation with the other tests in the battery. The higher 
B is, the more the group is distinguishable as a group. 
He begins with a pair of tests which correlate highly with 
one another, and finds their B. Then he adds a third test 
and finds the B of the three. Then another and another, 
until B drops too low. There is no fixed threshold for B, 
but a rather sudden drop would indicate the end of a 
group. 

2. Tryon’s growping—Another plan is to make a graph 
or profile of each row of correlations and compare these 
(Tryon, 1939), grouping together those tests with similar 
profiles. ‘I find it easier to consider only the peaks of each 
row and compare the rows with regard to these. If we 
mark, in each row of the above, the five highest correlations 


Ta 
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in that row, and also the diagonal cell, we get the following 
set-of peaks : 
al my pe R GT = ET aa 12 


| 
1 |) ane x x x 
2/x x x X X x 
3 | x x x si DC 
4 | e5e A O RX x 
5 | x x x x A 
G ae x X x 
ro x x x x x x 
8 | x x x x x x 
OTA =x X a S EA 
10 | x x x x “Saeed 
11 | x x x Sie SGo SC 
12 | 59 X x M4 ON x 


We then see that, in the rows, 

(a) Tests 3, 7, 10, 11 have identical peaks, 

(b) 4, 2, 8, 12 ” » » 

(c) »” 4, 6 ” » » 
and we take these as nuclei for three groups. There re- 
main Tests 1, 5, and 9. Their average correlations with 
each of the above nuclei are : 


a b c 
1 39 40 -54 
5 57 -87 -85 
9 483 “49 “41 


We therefore add Test 1 to group ¢, Test 5 to group 4, 
and (less certainly) Test 9 to group b. We then rewrite 
our matrix with the tests thus grouped (see next page) : 

It will be seen that certain additions have been made in 
readiness for the various methods of calculation of the g 
loadings which are then possible. If we symbolize the 


table overleaf as 


A D E $ 
D B F 
E F C 


A ccossioned No. SR 
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35 7 ÖH 2) BMI 12 146 

3 57 59 70 73 | 34 16 44 20 1-14!) 40 18 27 | -85 
5 | 57 40 67 63 | 53 26 68 31 1-78) 63 27 42 | 1:32 
7 |5940 60 64 17 08 22 10 | -57 | 20 09 13, -42 
10 | 70 6760 78 | 43 21 56 25 1-45 | 52 22 34 | 1-08 
11 | 73 63 64 78 | | 39 18 50 23 | 1-80) 42 30 | -95 

G24. 1-62 
2 | 84 58 17 43 39 | 1-86 44 68 56 | 57 25 39 1-21 
8 | 16 26 08 2118 | 89| 44 35 43 | 28 12 18 | -58 
9 | 44 68 22 56 50 |2-40 | 6835 44. | | 74 32 50 |1-56 
12 | 34 15 238 | -72 


20 31 10 25 23 1-09 | 56 43 44 


| | 
6-24 | | | 4-07 
| 


1 | 40 63 20 52 45 | 2-20 | 57 28 74 34 | 1-93 | 45 63 | 
4 | 18 27 09 22 20 ‘96 | 25 12 32 15 -84 | 45 51 | 
| 


6 | 27 42 13 34 30 | 1-46 | 39 18 50 23 (1:80 | 63 51 


| | | 
4-62 | `| 4-07 | 


all methods depend on using only the correlations in the 
rectangles D, E, and F, since the suspected group factors 
which increase the correlations in A, in B, and in C do not 
influence D, E, and F. Each correlation in the latter 


rectangles is therefore the product of two g-saturations 
(see page 9). Thus : 


Ta = 40 = ll 
T32 = 34 = [gly 
Nye = 57 = Ll, 
A "40 X -834 
2k = s7 = 24 l = -49 


where it should be noted that the three correlations come 
from E, D, and F respectively. 

But this value for the loading of Test 3 depends upon three 
correlations only and would, in a real experimental set of 
data, vary somewhat with our choice of the three. A 
method cf using all the possible correlations in these three 


rectangles is needed. One such is given by Holzinger in 
his Manual (1937a). 
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3. Holzinger’s formula.—tE all possible ways of choosing 


V3i7'3; 


the two other tests are taken, and the fraction = formed 


o i 
in each case; and if the numerators of these fractions are 
added together to form a global numerator, and their 
denominators to form a global denominator ; it will then 
be found that the fraction thus formed is equal to 

1-14 x +85 

4-07 
and this time all available correlations have been used. 
The rule is to multiply the two totals in the row of the 
test (1-14 x -85) and divide by the grand total of the 
block formed by the other tests concerned (1, 4, and 6 
with 2, 8, 9, and 12, ie. 4-07). For Test 2 this rule gives 

fi eo = -49, l, = -70. 

This Holzinger method is not difficult to extend to four 
or more groups. If we symbolize a four-group matrix by 
A DESSE G 
D B F H 
E F Cc K 
G H KET 
and consider the first test, then its g-loading lis given by 

; de + dg + eg 
~ F+H+K 
where d, e, and g are the sums of its row in D, E, and G. 

4. Burt’s formula—Another method is given by Burt 
(1940, 478). For the numerator of each g loading he takes 
the sum of the side totals which Holzinger multiplied. 
Thus the numerators are : 

for Test 3, 1-14 + -85 = 1-99 

9-8 1:78 + 1-82 = 8-10 

2, 1-86 + 1-21 = 3-07 
sy 12, 1:09-+4+ -72 = 1-81 


6, 1-46 + 1:30 = 2-76. 


I 


24, ly = -49 


e 
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The denominators differ in group a, group b, and group c, 
but all are formed from the three quantities 6-24, 4-62, 
and 4-07. For group a the denominator, is : 


y4 ord res oo = 0s: 

It will be seen that the two quantities within the curly 
brackets are the totals of D and E, the two rectangles 
from which the numerators of group a come. By analogy 
the reader can write down the denominators of group b 
and group c—they come to 4-40 and 5-01. Dividing the 
numerators by the appropriate denominators, we get for 
the g loadings : 

Test SEO es 10 2" 8 9 12 awa 
g Loading -49 -76 -24 -62 -55 -70 -33 -90 -41 -82 -36 -55 


The proof of Burt’s formula is surprisingly casy. If the 
reader will write down, in place of the correlations in D, 
E, and F, the literal symbols ll, (for 7,,)—sinee our 
hypothesis is that only g is concerned in these correlations 
~and will write out the sums, ete., of the above calculation 
literally, he will find that Burt’s formula simplifies almost 
immediately to one l, that of the test in question. Burt 
only gives his formula for three groups. It can be extended 
to the case of more groups, but becomes cumbersome and 
rather unwieldy. 

5. The test of correct grouping—Now comes the test of 
whether our grouping is correct, and our hypothesis valid 
that groups a, b, and c have nothing in common but the 
factor g. Using the loadings we have found, form all the 
products J, and subtract them from the experimental 
correlations. All the correlations in D, E, and F should 
then vanish or, in a real set of data (ours are artificial), 
become insignificant. There should, however, remain 
residues in A, B, and C due to the second factors running 
through groups a, b, and e respectively. In our example 
the subtraction of the quantities ll, gives the residues 
shown at the top of page 25. 

The correlations left in A, if they are due to only one 
other factor (now that g has been removed), ought to show 
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Bio OAR (2983 9S Le 14 6 
g Loadings 49 76 24 62 55 70 33 90 41 82 36 55 


3 49| 20 47 40 46 
e 5 76/20 22 20 21 
7 24/4722 45 51 
10 62 402045 44 
11 55 46 21 51 44 
| — —— — | 
2 70 | 23 63 29 
8 88 23 380 14 
9 90 63 30 37 
12 41 | 29 14 37 
1 82s) 15 18 
36 15 81 
6 55 | 18 381 


L = ee = 


zero or very small tetrads ; and so they do. Those in B 
are also hierarchical. Those in C are too few to form a 
tetrad. The second factor in each of these submatrices 
can now be found in the same way as g is found from a 
matrix with no other factor : see page 9 and, later in this 
book, pages 42 to 44. The reader should complete the 
calculation, and will find these loadings : 


Factors 
Test g u v w 
8 -49 65 5 


2 ‘70 : “44 


8 -83 5 “AT 
9 90 j ‘ll 
12 “41 ` 62 
+82 r 3 29 


1 
4 -86 y E -50 
6 -55 ; : 62 
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An actual set of data will not give so perfect a hollow 
staircase, but at this stage the strict bifactor hypothesis 
can be departed from and additional Small loadings or 
further factors added to perfect the analysis. Where a 
bifactor pattern exists, a simple method of extracting 
correlated or oblique factors has been given by Holzinger 
(1944) “based on the idea that the centroid pattern 
coefficients for the sections of approximately unit rank 
may be interpreted as structure values for the entire 
matrix.” 

6. Cluster analysis.—This is connected with the bifactor 
method, which is possible when clusters do not overlap. 
But it is by no means rare to find two or three variables 
entering into several distinct clusters. Raymond Cattell’s 
article (1944a) describes four methods of determining 
clusters, and gives references which will lead the interested 
reader back to much of the previous work, and see also 
Tryon’s work Cluster Analysis, 1939. The most naive 
method of classifying tests into clusters, one needing no 
mathematics whatever, is simply to put together all the 
tests which intercorrelate above a certain level. We can 
illustrate this adequately on the above example. Let us 
collect into clusters tests which correlate with one another 
at Teast 0-40. A routine is desirable to ease the task and 
avoid overlooking any clusters. Turn to the table on 
page 20 and write down from the first row all the tests 


which have Correlations of 0-40 or more with Test 1, 
including itself, 


LO} 11) 
10 
10 
10 


Cluster A, Tests 1, 2, 5, 9, 10. 


N N 
Or Or or 
ooo 


Then consider the t 


happens to be Test 2, and go along its line in the correlation 
table to see which of the tests already noted also correlates 
sufficiently with Test 2. They are 5, 9, and 10. The 
other tests of our first line drop out. We then look along 


est next to No. 1 in this line, which 


= 
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the line of Test 5’s correlation coefficients, and find that 
Tests 9 and 10 survive this scrutiny. Finally, we note 
that Tests 9 and 10 themselves correlate enough. The 
cluster «A is therefore (reading down the left-hand edge of 
the above triangular set of notes) composed of Tests 1, 2, 
5,9,and 10. At this point, to avoid missing other clusters 
which may begin with Test 1, it is necessary to consider 
what would have happened had Test 2 not been in the 
battery. It would be tedious to describe the whole pro- 
cedure here, but the reader is urged to go through it, when 
he will find six clusters, shown in this diagram. 


Figure 7. 


7. Comparison with the bifactor groups.—If we compare 
these clusters with the grouping we found by Tryon’s 
method of profiles (or peaks), we see that our present clusters 
F, E, and C are those we arrived at formerly (except for the 
absence of Test 9 from cluster E). And we notice also 
that in our diagram these are mutually exclusive clusters. 
The missing Test 9 is the one we formerly had most doubt 
about classifying. The reason can be seen from the analy- 
sis we have already made. It is highly saturated with the 
general factor, and only very weakly with the verbal 
factor which decides its bifactor group. 

8. A less artificial ecample—The above example was an 
artificial one, made so as to “ come out ” exactly. Let us 
turn to a more realistic example where this is not the case. 
The following correlations—decimal points are again 
omitted—are from an actual report, but to obviate some 
embarrassments in a didactic example I have made all the 
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coefficients rather larger than they actually were. The 
first seven “ tests ” are examinations in school subjects, 
the next four are “ non-verbal ” testsewith simple pieces 
of apparatus, and the last three are special tests sapposed 


to be uncontaminated by any group factor other than g, 
v, and k (the “ space ” factor). 


de?) 8) 4 5 6) y S D Io 1 12 as 
1 Physics 76 82 68 64 40 28 44 19 16 21 45 11 
2 Chemistry. | 76 68 62 52 26 26 43 36 29 23 38 15 
3 Mathematics! 82 68 68 47 48 21 37 23 13 20 43 19 
4 French . | 68 62 68 45 23 84 29 25-18 05 26 34 
5 Mech. Draw.| 64 52 47 45 86 17 5 38 21 36 07 
6 Problems . 40 26 48 23 36 19 51 47 20 40 47 05 
7 Reading . | 28 26 21 34 17 19 09 07 02 17—07 38 
8 Koh’s Blocks| 44 43 37 29 53 51 09 81 50 50 64 43 
9 Cube Constr.| 19 36 23 25 55 47 07 81 42 53 53 37 
10 Form Board | 16 29 13-13 38 99 02 50 42 52 84 19 
11 Passalong . | 21 23 20 05 21 40 17 50 58 52 32 32 
12 g test © | 45 88 43 26 86 47-07 64 53 34 32 40 
13 v test - | 11 15 19 34 07 05 38 43 37 19 32 40 
14 k test - | 10 13 18 00 42 36 03 65 66 38 46 57 45 

When by the 


above method we sort these tests into clus- 


ters, using 0-40 as boundary line, we obtain the following 


diagram : 


Figure 8, 


In passing, we may note that this di 
what Raymond Cattell (1946) 
one which forms the centre of 
clusters. Here the pair 8 an 
occur together in clusters BAG; 
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nuclear cluster. For bifactor analysis, however, we want 
non-overlapping clusters. 

9. A first attempt at grouping.—Searching in this diagram 
for at least three non-overlapping contours, we find 
clusters A, F, and either C or D. Of the alternatives let 
us take D, and rewrite our table of correlations with these 
clusters separated. This leaves Tests 6 and 7 out of the 
picture, and further study of the diagram leads us also to 
omit 5, which is linked with both F and D through cluster B. 
Our table, and its calculations, then is as follows : 


TRAINS A | 8 9 10 11 12 18 14 
| | 
1 76 82 68 | | 44 19 16 21 |1-00 | 45 11 10 | -66 
2| 76 68 62 | | 43 36 29 23 /1-81|] 38 15 13 | -66 
3| 82 68 68 | | 87 23 13 20 | -93| 43 19 18 | -80 
4| 68 62 68 | 29 25—13 05 | -46| 26 34 00 | -60 
| 3-70 | 2-72, 
8| 44 43 87 29 | 1-53 81 50 50 G4 43 65 1-72 
9| 19 86 23 25 /1-08/ 81 42 53 53 87 66 | 1-56 
10| 16 29 18—18 | -45| 50 42 52 34.19 38 | -91 
11] 21 28 20 05 | -69| 50 58 52 82 32 46 | 1-10 
| 3:70 | 5:29 
12| 45 38 43 26 |152| 64 53 34 32 |1-83 40 57 | 
13| 11 15 19 34 | 79| 43 37 19 32 |1-31|40 45 
14/10 13 18 00 | -41| 65 66 38 46 |215| 57 45 
2-72 5:29 


From this table, by Holzinger’s formula, we obtain the 
g loadings shown at the right of the next table. For 
example : 
0-45 x 0-91 
y 2:72 


Ly = = 15055, lo = -388 


When, using these g loadings, we remove the parts of the 
correlations due to that factor, we get the following table 
of residues. For example : 


‘76 — -353 X -404 = -62. 
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Residues 

3 
e Dy T 8 9 10 11] 12 18 A) Toad 
ings 
1 62 69 60| 09 —08 02 02| 14 —08 %07 | -353 
2| 62 58 53| 08 05 13 02| 03 —06 —07| -404 
3| 69 53 59| 00 —06 —02 00! 10 —01 00] -375 
4| 60 53 59 07 07 —22 —07| 06 22 —11] -228 
8| 09 038 00 07 05 12 —02| —21 —09 17| -984 
9| —08 05 —06 o7| 05 12 12| —14 —04 28] -769 
10} 02 18 —02 —22| 12° 12 32| 00 —02 19] -388 
11| 02 02 00 —07|—02 12 32 —14 04 20) -528 
12) 14 08 10 06] —21 —14 00 —14 —06 15| -867 
13| —08 —06 —01 22| —09 —04 —02 04| —06 19 | -529 
14| —07 —07 00 —11| 17 28 19 20| 15 19 +488 


On examining these residues, however, we see that this 
time our hypothesis, that the clusters are exclusive with 
regard to their second group factors, is not justified. True, 
many of the residues in the side squares are very small. 
But: two facts strike the eye: Test 14 (the k or space 
factor test) has quite large residues with the middle or non- 
verbal group, and Tests 10 and 11 (Form Board and 
Passalong) have a much larger residue than the other 
tests in the middle square. These facts suggest further 
purging the battery of 14 and either 10 or 11. It is very 


Residues 

1 2 3 4 8 9 11 12 13 | Load- 
ings 

1 57 64 52 08 —08 02 10 —11 | 424 \ 
2 57 48 45 05 07 08 00 —09 | -455 
3 64 48 52 00 —05 0l 07 —04 | 436 
4 52 45 52 —02 02 —11 —05 15 | -368 
8 08 05 00 —02 28 13| —06 —O01 | -842 
9 —08 07 —05 02 28 25 00 04 | -633 
11 02 03 0l —11 187 25 —04 09| 437 
12 10 00 07 —05| —06 00 —04 13 | -835 
13 | —11 —09 —04 15| —01 04 09 13 +522 


x 


= 


i 
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frequently necessary to “purge” a battery before the 
proper loadings of the remaining tests can be ascertained. 
10. The purged battery—wWhen we do this (the reader 
should rewrite the tables and carry out the work), we get 
the lozdings and residues shown at the foot of page 30. 
This table is much more like our artificial model. None 
of the correlation coefficients in the side squares are far 
from zero—we shall learn later how to decide whether they 
are, in fact, small enough to be ignored. Meanwhile, let us 
assume this, and suppose, that is to say, that these three 
groups of tests really are exclusive of one another in their 
second group factors. Their loadings in these we could 
then proceed to calculate. This is easily done in the middle 
group, where there are exactly three tests. We have: 


28 X -18 
— — = +1456, mọ = ‘382 
te 25 8 
. -25 
m = 28 eee -5884, my = 734 
-13 
25 X ‘13 
m, == 3 = ‘1161, my = :341 


The equations of these three tests are therefore: 


Za = ‘842g + -8382h + -883 sg 
Za = 633g + 734 + -246 Sa 
zu = 487g + -B41h + -832 sy 


where the group factor common to them is given the non- 
committal name h. The coefficients of the specifics are 
settled by the fact that the sum of the squares of the co- 
efficients of such an equation (since the factors are inde- 
pendent) must equal unity. It will be noticed that Test 11 
(Passalong) has here a large specific. It probably shares a 
good deal of this with Test 10 (Form Board) which we 
excluded from the battery meanwhile for this very reason.* 
We cannot similarly calculate the group factor loadings of 
the third group of tests, for there are only two of them and 


* It should be repeated at this point that this example is purely 
illustrative, and no conclusions about actual tests may be drawn 
from this or from any of our examples. This is a book about 
factorial methods, not results. 


32 THE FACTORIAL ANALYSIS OF HUMAN ABILITY 


three tests are necessary. We only know that the product 
of their two group factor loadings is -138. This emphasizes 
the necessity, in planning a bifactor battery, to have a 
sufficient number of tests. There must be at least three 
groups, and at least three tests in each group. j 

The first group has four tests, and our first step should be 
to see whether its tetrad-differences are zero. If they were 
exactly zero, it would be immaterial which three of the 
four tests we chose to calculate loadings from. Here the 
tetrad-differences, though small (-0084, -0384, -0468), are 
not exactly zero. We shall defer to the next chapter 
(page 48) the question of how to make the best estimate 
of the loadings under these circumstances, but the reader 
might care to calculate them from every possible three of 
the four tests and average the results. Our illustration has 
served its purpose of bringing to light difficulties which do 
not exist in an artificial example made to avoid raising 
them. 


a aa 


CHAPTER III 


SAMPLING ERROR AND THE THEORY OF TWO 
FACTORS 


1. Sampling error—The general idea underlying the 
notion of a sampling error is not a difficult one. Take, for 
example, the average height of all living Englishmen who 
are of fullage. This could, if need be, be ascertained by the 
process of measuring every living Englishman of full age. 
Actually this has never been done, and when anyone makes 
a statement such as “ The average height of Englishmen is 
67% inches,” he is basing it upon a sample only. This 
sample may not be an unbiased one. Indeed, samples of 
Englishmen whose height has been officially recorded are 
heavily loaded with certain classes of Englishmen—for 
example, prisoners in gaol, and unemployed young men 
joining the army of preconscription days. The average 
height of such men may well differ from that of all English- 
men. But when we speak of sampling error, we do not 
mean error due to the sample being known to be a biased 
one. Even if the sample of Englishmen used to find the 
average height of their race were, as far as could be seen, a 
perfectly fair sample, containing the proper proportion of 
all classes of the community and of all adult ages, ete., it 
yet would not necessarily yield an average exactly equal 
to that of all Englishmen. Several apparent replicas of the 
sample would yield different averages. It is these differ- 
ences, between statistics gathered from different but 
equally good samples, that we mean by sampling errors. 

It is worth while calling attention at this point to a 
general fact which will be found of importance at a later 
stage of this book. The true average height of Englishmen 
is only so by definition, and does not in principle differ 
from the average of a sample. We had to define the popu- ` 
lation we had in mind as “all living Englishmen of full 
age.” This is a perfectly well-marked body of men. But 
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it is itself in its turn only a sample: a sample of all living 
Europeans, or all living men. It is, indeed, altering daily 
and hourly as men die or reach the age of 21, and each 
generation is a sample of those that have been and may be. 
Those who reach the age of 21 are only some, and therefore 
only a sample, of those born. And even those born are 
only a sample of those who might have been born had 
times been better or had there been no war, or a tax on 
bachelors. So the idea of sampling is a relative one, and 
the “complete population ” from which we take samples 
is a matter of definition only. The mathematical problem 
in connexion with sampling which it is desirable to solve 
if possible for each statistic is to find the complete law of 
its distribution when it is derived from each of a large 
number of samples of a given size. Mathematically this 
is often very difficult, and frequently we have to be 
content with a formula which gives its approximate 
variance if certain assumptions are allowed and certain 
small quantities are neglected. 

Sampling problems are of two kinds, direct and inverse. 
The easier kind of problem is to say what the distribution 
of a statistic will be in samples of a given size when we 
know all about the true values in the whole population : 
the more difficult kind is to estimate what the true value 
of a statistic is in a complete population when we know 
its observed value in certain samples. They differ as 
do problems of interpolation and extrapolation. As an 
example of the direct kind of problem, let us suppose that 
we actually knew the height of every adult Englishman 
of full age. We could then, on being told a certain sample 
of p Englishmen averaged such and such a height, calculate 
the probability that this sample was a random sample, a 
probability that would obviously grow less as the average 
of the sample departed from the average of the whole 
population. It would also depend on the size of the sample, 
for if a very large sample deviates far from the true average, 
it is less likely to be random, more likely to have some 
reason for the difference, than a small sample with the 
same average would have. 

2. Standard errors.—By the distribution of a certain 
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variable in the population we mean the curve (usually 
expressed as an equation) showing its frequency of occur- 
rence for each possible value. Thus the curve in Figure 9 
might, show the distribution of height in living adult 
Englishmen, by its height above the base line at each point. 
More men (represented by the line MN) have the average 
height, 674 inches, than have the height 73 inches, the 
frequency of the latter being shown by the line PQ. The 
shaded area represents all men whose height is 73 inches 
or more, and its ratio to the area under the whole curve 
is the probability that an Englishman taken absolutely at 
random will have a height of 73 inches or more. 

Very often distributions are, at any rate approximately, 
of a certain shape called the “ normal curve.” The normal 
curve has a known equation, it is symmetrical about its 
mid point, and with the aid of published tables can be 
drawn accurately (or 


reproduced arithmeti- N 

cally) if we know the 

mid point M (which S s! 

is the average of the Q 
measurements) and a 

certain distance ST or M P 
S'T (which is equal to el 6265 64 65 bb 67 68 697071727374 7576 
the standard deviation Figure 9. 


of the measurements). 
S and S’ are the points where the curve changes from 
being convex to being concave. 

If the distribution of a variable, say the heights of adult 
Englishmen, is “ normal,” .then the distribution of the 
means of samples of p Englishmen’s heights will also be 
normal, but will be more closely concentrated about the 
point M than are the measurements of individuals: in 
point of fact, its variance will be p times smaller, its 
standard deviation thus „yp times smaller. That is to 
say, if we take sample after sample of 25 Englishmen 
each time, and for each sample record the average height, 
the means thus. accumulated will be distributed ina curve 
of the same shape as that of Figure 9, but narrower from 
side to side, so that SiS’ would be one-fifth (1/25) of what 
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it is in Figure 9, which is the distribution of single 
measurements. 

If a sample were made with some special end in view, 
such as ascertaining whether red-headed men tend to be 
tall, we would decide whether we had detected such a 
tendency by calculating the probability that a mean such 
as our red-headed sample showed, or a mean still farther 
away from M, would occur at random. For this purpose 
we would compare the deviation of our sample from M 
with the standard deviation of the distribution of such 
samples, obtained by dividing the standard deviation of 
individuals by the square root of p, the number in the 
sample. The ratio of the deviation found, to the standard 
deviation, is the criterion, and the larger it is the more 
likely is it that red-headed men really do tend to be tall. 
For many practical purposes we take a deviation of over 
twice the standard deviation as “ significant.” 

Sometimes the reader will find significance questions 
discussed in terms of the “ probable error ” instead of the 
standard deviation. The probable error is best considered 
as a conventional reduction 6f the standard deviation (or 
standard error, as it is sometimes called) to two-thirds of 
its value (more exactly, to -67449 of its value). 

Not only would the average height, or the average weight, 
of the sample of red-headed men differ from sample to 
sample. Statistics calculated in more complex ways from 
the measurements will also vary from sample to sample, 
as, for example, the variance of height, or the variance of 
weight, or the correlation of height and weight. Let us 
consider first the variance of the heights. In the whole 
population this is calculated by finding the mean, expres- 
sing every height as a plus or minus deviation from the 
mean, squaring all these deviations, and dividing the sum 
by the number in the population. 

This is also how we would find the variance of the sample 
if we really want the variance of the sample. But if we 
want an estimate of the variance in the whole population, 
and the sample is small, it is better to divide by one less 
than the number in the sample. A glimpse of the reason 
for this can be got by considering the case of the smallest 
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possible sample, namely, one man. Here the mean of the 
sample is the one height that we have measured, and the 
deviation of that measurement from the mean of the sample 
is zero. The formula if we divide by the number in the 
sample (one) will give zero for the variance—and that is 
correct for the sample. But it would be too bold to estimate 
the variance of the whole population from one measurement: 
if we divide by one less than the sample we get variance 
= 0/0, that is, we don’t know, which is a wiser state- 
ment.* 

More generally we can begin to understand the reason 
for dividing by (p — 1) instead of by p by the following 
considerations. 

The quantity we want to estimate is the mean square 
deviation of the measurements of the whole population, 
the deviations being taken from the mean of that whole 
population. We do not, however, know that true mean, 
and therefore in a sample we are reduced to using the mean 
of the sample, which except by a miracle will not exactly 
coincide with the true or population mean. The conse- 
quence is that the sum of the squares we obtain is smaller 
than it would have been had we known and used the true 
mean. For it is a property of a mean that the sum of the 
squares of deviations from it is smaller than of deviations 
from any other point. 


* Tt is important to remember that sampling the population is not 
the only source of error in the measurement of statistics, e.g. the 
correlation coefficient. All sorts of influences may disturb it. These 
will usually “ attenuate ” the correlation coefficient, i.e. tend to 
bring it nearer to zero, as can be seen when we consider that a perfect 
correlation only can be reduced by error. But they will not always 
do so, and if the errors in the two trait measurements are themselves 
correlated, they may even increase the true correlations in a majority 
of cases. An estimate of the amount of variable error present can 
be made from the correlation of two measurements of the same 
trait on the same group, a correlation called the * reliability,” which 
should be perfect if no variable errors are present. Spearman’s cor- 
rection for attenuation (see Brown and Thomson, 1925, 156) is based 
upon this. Like all estimates, the correction for attenuation js correct, 
even if the errors are uncorrelated, only on the average and not in 
each instance, and it should never be used unless it is small. If it 
is large, the experiments are “ unreliable * and should be improved. 
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Consider for example the numbers 2, 3, and 7. Their 
mean is 4, and the sum of the squares about 4 is— 


(— 2)! + (— 1} +3 =14 
About any other point this sum will be greater than 14. 
About 5, for example, the sum is— 


(— 3} + (— 2} +2 =17 
About 2 the sum is— 
0? + 1? + 5? = 26 


It follows that the sum of the squares we obtained by 
using the sample mean was as small as possible, and in the 
immense majority of cases smaller than the sum about the 
true mean. It is to compensate for this that we divide 
by (p — 1) instead of by p. 

These elementary considerations do not of course indi- 
cate just why this procedure should, in the long run, ex- 
actly compensate for using the sample mean. Why not 
(p — 2), one might say, or (p — 3)? It is not possible, in 
an elementary account like the present, to answer this. 
Geometrical considerations, however, throw some further 
light on the problem. The p measurements of the sample 
may be thought of as existing in a certain space of (p — 1) 
dimensions. For example, two points define a line (of one 
dimension), three points define a plane (of two dimensions), 
and so on. The true mean of the whole population is not 
likely to be within that space, whereas the mean of the 
sample is. The deviations we have actually squared and 
summed are therefore in a space of one dimension less than 
the space containing the true mean. One “ degree of free- 
dom ” has been lost by the fact that we have forced the 
lines we are squaring to exist in a space of (p — 1) di- 
mensions instead of permitting them to project into a 
p-space. Hence the division by (p — 1) instead of p. 

This principle goes farther. For each statistic which we - 
calculats from the sample itself and use in our subsequent 
calculations, we lose a “ degree of freedom.” 

The standard error of a variance v, if the parent popula- 
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tion from which the samples are drawn is normally distri- 
buted, is estimated as 


where p is the number of persons in the sample. The 
standard error of a correlation coefficient 7 is, with the 
same condition, estimated as— 


: 1—?7 
V(p — 1) 


The use of this standard error, however, should be dis- 
continued (unless the sample is large and r small). 

Fisher (1925, page 202) has pointed out that the use of the 
formula for the standard error of a correlation coefficient 
is valid only when the number in the sample is large and 
when the true value of the correlation does not approach 
4+ 1. For in small samples the distribution of r is not 
normal, and even in large samples it is far from normal 
for high correlations. The distribution of r for samples 
from a population where the correlation is zero differs 
markedly from that where the correlation is, say, 0-8. 
This means that the use of a standard error for testing 
the significance of correlation coefficients should, except 
under the above conditions, be discouraged. 

To get over the difficulty Fisher transforms r into a new 
variable z given by— 

z = Hlog,(1 + r) — lox.(1 — 7)} 
=r +e iry... 
It is not, however, necessary to use this formula, as com- 
plete tables have been published for converting values 
of r into the corresponding values of z. As r goes from — 1 
to + 1, z goes from — oo to + œ, and r = 0 corresponds 
toz =0. 

The great advantage of using z as a variable instead of r 
is that the form of the distribution of z depends very little 
upon the value of the correlation in the population from 
which samples are drawn. Though not strictly normal, it 
tends to normality rapidly as the size of the sample is 
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increased, and even for small samples the assumption 
of normality is adequate for all practical purposes. The 
standard deviation of z may in all cases be taken to be 
1/Vp — 3, where p is the number of persons in the sample. 

8. Error of a single tetrad-difference—For our discussion 
of the influence of sampling on the factorial analysis of 
tests one of the most important quantities to know is the 
standard error of the tetrad-difference. There has been 
much debate concerning the proper formula for this. (See 
Spearman and Holzinger, 1924, 1925, 1929; Pearson and 
Moul, 1927; Wishart, 1928 ; Pearson, Jeffery, and Elder- 
ton, 1929; Spearman, 1931.) That generally employed is 
formula (16) in the Appendix to Spearman’s The Abilities 
of Man: 

Standard error of 713% 24 — Togia = 

2 [Spearman and 
irl — re — r. 7? 1 — 2r?) Holzinger’s 
VN SC les ig formula (16).] 
where N is the number of persons in the sample,* 


r is the mean of the four correlation coefficients, and 
s? is their mean squared deviation (variance) from 7. 


The probable error is -6745 times the above. A worked 
example will be found on page xii of Spearman’s Appendix, 
using (which is all one can do) the observed values of the 7’s. 

It will be remembered that in Section 7 of Chapter I 
we stated Spearman’s discovery in the form “ tetrad- 
differences tend to be zero.” Tf tetrad-differences in the 
whole population, however, were all actually zero, they 
would not remain exactly zero in samples, and it is only 
samples that are available to us. We are faced, therefore, 
with a two-fold problem. (a) We have to decide, from the 
size of the tetrad-differences actually found in our sample, 
whether the sample is compatible with the theory that the 
tetrad-differences are zero in the whole population. But 
(b) we should also go on to consider whether the sample is 
equally compatible with the Opposed hypothesis that the 

* We use p to mean the number of persons in this book, but are 


retaining N here and in “ formula 164” below to preserve the usual 
appearance of these well-known and much-used expressions. 


Pays 
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tetrad-differences are not zero in the whole population, 
leaving a verdict of “ not proven.” (See Emmett, 1936.) 
4. Distribution of a group of tetrad-differences——The 
actual calculation, for every separate tetrad-difference, of 
its standard error by Spearman and Holzinger’s formula 
(16) is, however, an almost impossibly laborious task. In 
a table of correlations formed from n tests there are 
n(n —1)/2 correlation coefficients, and n(n — 1)(n — 2) 
(n — 3)/8 different (though not independent) tetrad- 
differences. Any one particular correlation-coefficient is 
concerned in (n — 2)(n — 8) different tetrad-differences, 
and any one test in (n — 1)(n — 2)(n — 8)/2 different 
tetrad-differences. Thus with ten tests there are 630 
tetrad-differences, and with twenty tests 14,535 tetrad- 
differences. In the latter case, any one test is concerned 
in 2,907. Under these circumstances, it is natural to look 
for a more wholesale method than that of calculating the 
standard error of each tetrad-difference. The method 
adopted by Spearman is to form a table of the distribution 
of the tetrad-differences, and compare this distribution 
with that of a normal curve centred at zero and with 
standard deviation given by— 
Pree b 3 [Spearman and Hol- 
NUN S R zinger’s formula (164).] 
where N = number of persons in the sample, 
r = the mean of all the 7’s in the whole table, 
s? = their mean squared deviation from r. 
n— 4 n—6 


R= 3r. 27 


: , and 
n— 2 n— 2 


= number of tests. 

Numerous examples of the comparison of “ histograms ” 
of tetrad-differences with normal curves whose standard 
deviation is found by (16a) are given in Spearman’s The 
Abilities of Man. This method of establishing the hypo- 
thesis, that the tetrad-differences are derived by sampling 
from a population in which they are really zero, is open to 
the same doubt as was explained in the simpler case of 
one tetrad-difference. The comparison can prove that 


F.a. —2* 
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the tetrad-differences observed are compatible with that 
hypothesis. It does not in itself prove that they are 
compatible with that hypothesis only ; and, as Emmett 
has shown in the article already mentioned; the odds are 
commonly rather against this. 

The usual practice, moreover, is to “ purify ” the battery 
of tests until the actual distribution of tetrad-differences 
agrees with (16a), so that in effect all that is then proved 
is that a team can be arrived at which can be described in 
terms of two factors. This, although a more modest 
claim than has often been made, and certainly less than 
is implicitly understood by the average reader, is never- 
theless a matter of some importance. Not all teams of 
tests can be explained by one common factor; but it is 
not very difficult to find teams which can. There is little 
doubt in the minds of most workers that a tendency towards 
hierarchical order actually exists among mental tests. 

5. Spearman’s saturation formula.—It will be remem- 
bered from Section 4 of Chapter I that the calculation of 
the g saturation of each test forms an important part of 
the Spearman process. We saw there that in a hierarchical 
matrix each correlation is the product of the two g satura- 
tions of the tests, for example— 

Taa = aq - Tag 

Since this is so, each g saturation can be calculated 
from the correlations of a test with two others, and their 
inter-correlation. Thus to find Tı We can take Tests 2 and 
3 as reference tests, when we have— 

Tiis L Tisoy + Tigfa 


n 2 
= 


T23 Tog + Tag 

When the matrix is really hierarchical, and there are 

no sampling errors present, it is immaterial which two tests 

we associate with Test 1 in order to find its g saturation. 
We have, in fact, in that case— 


E Ra. dRe GT 


= etc. 
T23 T45 To5 


But even if the correlations, measured in the whole 
population, were really exactly hierarchical, sampling 
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errors would make these fractions differ somewhat from 
one another, and we are faced with the problem of deciding 
which value to accept for the g saturation. The average 
of all.possible fractions like the above would be one very 
plausible quantity to take but is laborious to compute. 
Spearman therefore adopts a fraction— 


Me +13 F Tig + Tis + Tie + Tis + ete. as 

Tos a5. tk Tes + ete. 4 
whose numerator is the sum of the numerators, and whose 
denominator is the sum of the denominators, of the single 
fractions. This combined fraction he computes in a 


tabular manner which we will next describe, by the 
algebraically equivalent formula— 


en ee Se [Spearman’s formula (21), 
"w P94, Appendix, Abilities of Man.] 


The quantities 4,, Æ», ete., are the sums of the rows (or 
columns) of the matrix of correlations without any entries 
in the diagonal cells. (The arithmetical example is con- 
fined to five tests to economize space) : 


ogi 2 3 4 5 A A 
“24. 1-41 1-988 
15 153 2841 
35 | 188 1-904 
-29 L07 1145 
1-03 1-C61 
T = 6:42 


T is the sum of all the A’s, and therefore of all the 
correlations in the table (where each occurs twice). A 
new table is now written out, with each coefficient squared, 
and its rows summed to obtain the quantities 4’ : 


1 2 3 4 5 | A’ 
1 $ ‘250 -116 109 +058 -533 
2 -250 . +314 102 -023 689 
3 116 314 ` O17 1238 “570 
4 109 102115017; J 084 312 
5 058 023 123 -084 z 288 
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The calculation of all the saturations is then best per- 
formed in a tabular manner, thus : 


KS 5 
| AnA 7: B10) Ad NTa N A 
| T-24 ration 
1 | 1-988 | -533 2-82 | 3-60 «4042 -66 
2 | 2-341 | -689 3-06 | 3-36 4917-70 
3 1-904 -570 2-76 3°66 -3645 "60 
4 | 1145 | -312 214 | 428 | 1946 | -44 
5 | 1-061 | -288 


| 2-06 | 4:36 | 1773 42 
where the last column is the square root of the preceding. 
The reader should calculate the six different values of 
Ty from the original table by the formula (7; . T/T )*> 
for comparison with the value -66 obtained above. He 
will find— 


-55 72 -89 
93 48 
52 
with an average of -68. 

6. Residues.—If the correlations which would arise from 
these saturations or loadings are calculated, and subtracted 
from the observed correlations, we obtain the residues 
which have then to be examined to see if they are small 
enough to be attributable to sampling error. In the ` 
following double table of correlations are set out the ob- 
served correlations uppermost, and those calculated from 
the g saturations below. The difference is the residue, 
which may be plus or minus : 


g Loadings | -66 "70 “60 "4A 4.2 
| wl 

-66 = 50 34 +33 +24 
46 40 “29 28 
“70 -50 . -56 832 15 
| 46 -42 31 29 
60 “Bl 56 . 13 +35 
40 42 26 25 
"44 33 -32 13 . “29 
29 81 26 18 

42 ‘24 15 ‘85 "29 
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The lower numbers are the products of the two 
saturations. In this case the residues range from — -14 
to + -14 and at first sight appear in many cases to be 
too large to be neglected in comparison with the original 
correlations. 

To check this impression, consider the correlation -56 
and the value -42 from which it is supposed to depart only 
by sampling error, a deviation of -14, Fisher’s z corres- 
ponding to 7 = -42 is -45, and that corresponding to r = 
-56 is z = -63, so that the z deviation is -18. The standard 
deviation of z for 50 cases is 1 + 4/47 = -15. The devia- 
tion is little larger than one standard deviation and cannot 
therefore be called significant. But as the reader will ob- 
serve, this conclusion is due more to the large size of the 
standard error than to the small size of the residue. The 
residue is here attributable to sampling error, because the 
latter is so large. But because the latter is large it does not 
follow that the large residue is certainly due to it. 

7. Reference values for detecting specific correlation —If 
after a calculation like that described, one of the residues 
is found to be too large to be explicable by sampling error, 
the excess of correlation over that due to g is attributed to 
“specific correlation,” meaning correlation due to a part 
of their specific factors being not really unique but shared 
by these two tests. In the case of our numerical example, 
if the number of subjects tested had been larger, the standard 
errors of the coefficients would have been smaller, and some 
of the discrepancies between the experimental values and 
those calculated from the g saturations would have been 
too large to be overlooked, but would have had to be 
attributed to specific correlation. In such a case, the g 
loadings would, of course, be wrong and would have to be 
recalculated from the battery after one of the tests con- 
cerned in the specific correlation was removed from it. 
Later, the other test could be replaced in the battery 
instead of the first, and thus its g saturation found. The 
difference between the experimental correlation of the 
two, and the product of their g saturations, with a standard 
error dependent on the size of the sample, would be then 
attributed to their specific linkage. 


46 THE FACTORIAL ANALYSIS OF HUMAN ABILITY 


If two tests, v and w, are thus suspected of having a 
specific link as well as that due to g, it is clear that the 
smallest battery of tests which could be used in the above 
manner to detect that link would be one of two other tests, 
æ and y, say, to make up a tetrad : 


v ia 
w Ta Da 
y ee ee 


and these two “ reference ” tests would have to be known 
to have no specific links with each other or with the two 
suspected tests. The example which gave rise to Figure 5 
(see Chapter I, page 15) illustrates this. Tests 2 and 3 
there are, let us suppose, those with a suspected specific 
link. The tetrad-difference to be examined by means of 
Spearman’s formula (16) is that which has 73 as one corner. 
In such a case, where the two reference tests 1 and 4 are 
known to have no link except g with one another, or with 
the other two tests, two of the possible tetrad-differences 
ought to be larger than three times the standard error 
given by formula (16), and equal to one another, while the 
third tetrad-difference should be zero (or sufficiently near 
to zero, in practice) (Kelley, 1928, 67). 

The g saturation of each of the tests under examination 
for specific correlation can be found by grouping it with 
the two reference tests. Thus in the case of our Figure 5, 
we have— 


WEA. ition ea sets 
Ta? 5 
Tia 5 
otis: ts 5 X15 
Tag 5 
Tia 5 


Therefore the correlation between 2 and 3 which is due 
to g is— 


Toph ap VA A/ 2D =o 
and the difference between this and -8, the actual value, 


is the part to be explained by the specific factor shared by 
these two tests. 
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When there are several reference tests available, all 
believed to have no link except g with one another or with 
the two tests suspected of specific overlap, there will be 
a number of ways of picking two of them to obtain the 
tetrad required to decide the matter, and the results will, 
because of sampling and other errors, be discrepant. Under 
these circumstances Spearman has devised an interesting 
procedure for amalgamating the results into one. A 
numerical example is given by him on page xxii of the 
Appendix to The Abilities of Man. 


CHAPTER IV 
THE DEFINITION OF g 


1. Any three tests define a “ g.”—The idea of g arose out of 
Professor Spearman’s acute observation that correlation 
coefficients between tests tend to show hierarchical order : 
that is, that their tetrad-differences tend to be zero or small; 
or in more technical terms still, that the rank to which a 
matrix of correlation coefficients can be “ reduced ” by 
suitable diagonal elements tends towards rank one. This 
fundamental fact is at the basis of all those methods of 
factorial analysis which magnify specific factors. In con- 
sequence, correlation coefficients between a number of vari- 
ables can be adequately accounted for by a few common 
factors. To be adequately described by one only 
the “ reduced ” rank of the correlation matrix has to be 
one, within the limits of sampling error. ‘ 

Suppose now that we have three tests and have, in the 
whole population, measured their correlation coefficients. 
Tf, as is usually the case, these coefficients are all positive, 
and if each of them is at least as large as the product of the 
other two, we can explain them by assuming one g and 
three specifies s,, sz, and s, There are many other ways 
of explaining them, but let us adopt this one. We have 
thereby defined a factor g mathematically (Thomson, 19354, 
260). It is then for the psychologist to say, from a 
consideration of the three tests which define it, what name 
this factor shall bear and what its psychological deseription 
is. The psychologist may think, after studying the tests, 
that they do not seem to him to have anything in common, 
or anything worth naming and treating as a factor. That 
is for him to say. Let us suppose that at any rate he does 
not reject the possibility, but that he would like an oppor- 
tunity of studying other tests which (mathematically 
speaking) contain this factor, and have nothing else in 
common, before finally deciding. 

48 


a g— 


Se 
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In that case the experimenter must search for a fourth 
test which, when added to these three, gives tetrad- 
differences which are zero ; and then for a fifth and further 
tests, each of which makes zero tetrad-differences with the 
tests of the pre-existing battery. This extended battery 
the experimenter would lay before the psychological judge, 
to obtain a ruling whether the single common factor, of 
which it is the now extended but otherwise unaltered 
definition, is worthy of being named as a psychological 
factor. 

2. The extended or purified hierarchical battery—Mathe- 
matically, any three tests with which the experimenter 
cared to begin would define “ a ” g, if we except temporarily 
the case, to which we shall later return, of three correlation 
coefficients, one of which is less than the product of the 
other two. The experimental tester, however, might in 
some cases have great difficulty in finding further tests, to 
add to the original three, which would give zero tetrad- 
differences. Unless he could do so, it is unlikely that the 
psychological judge would accept the factor as worthy of 
a name and separate existence in his thoughts. It is, for 
example, an experimental fact that starting with three 
tests which a general consensus of psychological opinion 
would admit to have only “ intelligence ” as a common 
requirement, it has proved possible to extend the battery 
to comprise about a score of tests without giving any 
tetrad-differences which cannot be regarded as zero.* 
Even that has’ not been accomplished without difficulty, 
and without certain blemishes in the hierarchy having to be 
removed by mathematical treatment. But the fact that 
with these reservations it is possible, and that psychological 
judgment endorses the opinion that each test of this battery 
requires “ intelligence,” is the main evidence behind the 
actual “ existence ” of such a factor as “ g, general intelli- 

* The process of making such a battery of tests to define general 
intelligence (see Brown and Stephenson, 1933) has not in fact taken 
the form of choosing three tests as the basal definition and then 
extending the battery. Instead, a number of tests which, it was 
thought from previous experience, would act in the desirea way have 
been taken, and the battery thus formed has then been purified by 
the removal of any tests which broke the hierarchy. 
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gence.” It must be noted that the word “ existence ” 
here does not mean that any physical entity exists which 
can be identified with this g. It does mean, however, that, 
as far as the experimental evidence goes, there is some 
aspect of the causal background which acts “as if” it 
were a single unitary factor in these tests. 

The important point to note is that the experimenter has 
produced a battery of tests which is, he claims, hierarchical; 
that the mathematician assures him that such a battery 
acts “ as if”’ it had only one factor in common (though it 
can also be explained in many other ways), and that the 
psychologist agrees that psychologically the existence of 
such a factor as the sole link in this battery seems a reason- 
able hypothesis. 

3. Different hierarchies with two tests in common.—Now, 
it must be remembered that, starting with three other 
tests, which may contain two of the former set, it may 
very well be possible to build up a different hierarchy. 
Only experiment could show whether this were possible in 
each case, there is no mathematical difficulty in the way. 
Such a hierarchy would also define “a ” g, but this would 
be usually a different factor from the former g. If there 
were three tests common to the two hierarchies, then the 
two g’s could be identified with one another (sampling 
errors apart), and the three tests would be found to have 
the same saturations with the one g as with the other. But 
if only two tests were common to the two batteries this 
would not in general be the case, and the different satura- 
tions of these tests with the two g’s would show that the 
latter were different (Thomson, 1935a, 261-2). Under 
such circumstances the psychologist has to choose. He 
cannot have both these g’s. Both are mathematically of 
equal standing, it is a psychological decision which has to 
be made. When one g is accepted, the other, as a factor, 
must then be rejected and a more complicated factorial 
analysis of the second hierarchy has to be built up which 
is consistent with this. 

4. A trst measuring “ pure g.”’—Although the hierarchical 
battery defines a g, it does not enable it to be measured 
exactly (but only to be estimated) unless either it contains 
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an infinite number of tests, or a test can be found which 
conforms to the hierarchy and has a g saturation of unity.* 
In the latter case this test which is “ pure g ” is such that 
when it is considered along with any other two tests of its 
hierarchy, its correlations with them, multiplied together, 
give the intercorrelation of those two with one another : 
if k is the “ pure ” test, then— 


Tale = Tij 
its g saturation being— 


J Titik 1 
Tij 


No such “ pure ” test of the g which is defined by the 
Brown-Stephenson hierarchy of nineteen tests has yet been 
found. Such a pure test, with full g saturation, must not 
be confused with tests which are sometimes called tests of 
pure g because they do not contain certain other factors, 
in particular the verbal factor. Thus the “S.V.P.” 
(Spearman Visual Perception) tests are referred to by 
Dr. Alexander (1935, 48) as a “pure measure of g”; but 
their saturations with g are given by him (page 107) as 
“757, ‘701, and -736 respectively, so that in each case only 
about half the variance is “ g ” and half is a specific. 

5. The Heywood case—Consider the case where three 
tests are such that— 

Vili > Tij 

In such a case the g saturation of the test k, if we calcu- 
late it, is greater than unity, which is impossible. Yet it 
is possible, in theory at least, to add tests to such a triplet 
to form an extended hierarchy with zero tetrad-differences. 
There can be one such case (but only one) in a hierarchy. 
We shall call them Heywood cases, as this possibility was 
first pointed out by him (Heywood, 1931). As an artificial 
example, consider these correlations : 


* Jt is understood, of course, that even such a test would give 
different measures of a man’s g from day to day, if the man’s per- 
formance in it varied (as it undoubtedly would) from day to day. 
By measuring with exactness is meant, in this part of the text, 
measurement free from the uncertainty due to the factors out- 
numbering the tests. We are assuming sampling errors to be nil. 
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1 2 3 4 5 
1 | 1000 -945 -840 -735 -630 
2 945 1000 -720 -630 -540 
3 840 720 1:000 -560 -480 
4 T35 -630 -560 1:000 -420 
5 630 540 -480 -420 1-000 


This is a perfect hierarchy, every tetrad-difference being 
exactly zero. It is, moreover, a perfectly possible set of 
correlations, and passes the tests required for a matrix of 
correlations to be possible. For example, the determinant 
of the matrix is positive. But when we calculate the g 
saturations of the tests we find them to be : 


Test E 2 3 4 5 


Z saturation | 105 -9 83 7 6 


so that a single general factor is an impossible explanation 
of this hierarchy as far as Test 1 is concerned. The 
correlations of Test 1 with the other tests are possible, and 
they give exactly zero tetrad-differences : but yet the test 
cannot be a “ two-factor ” test, for the correlations of the 
first row are too high to be explained in that way. The 
tule governing its possible existence has been given by 
Ledermann, namely, that the g saturation of the Heywood 


case cannot exceed— 
AE +S 
S 


where S is the quantity familiar from Spearman’s formula 


2 

Ss = Ze 

Te 
for the remainder of the hierarchy (i =2, 8,4 ...). If, 
then, a Heywood test can be found to conform to a hier- 
archy, it seems likely that the g defined by that hierarchy 
must be abandoned. The seeker for a test for pure g is 
thus in a delicate position. He wants to find a test with 
full saturation of unity. But he must just hit the. mark. 
If the saturation exceeds unity, his whole hierarchy must 
be abandoned as a definition. And even when the exact 


THE DEFINITION OF g ‘ 53 


saturation of unity has been found, there seems to be too 
narrow a line dividing the perfect from the impossible, and 
the reality of the g seems to be balanced on a knife edge. 
In actyal practice, of course, sampling errors would make 
the situation less acute and could for some time be called 
in to explain a certain amount of excess saturation over 
unity. 

6. Hierarchical order when tests equal persons in number — 
If a test cannot be found whose saturation with g is unity 
(“pure g”), the other method of measuring g exactly 
would seem to be to extend the hierarchy until it compres 
so many tests that the multiple ae with g— 


S Z 1 
became practically unity. For S increases with the number 
of tests, being the sum of the positive quantities— 
2 
1 Er Tige 

There is here a point of some theoretical interest, namely, 
what happens when we have increased the number of 
hierarchical tests until they are as numerous as the persons 
to whom they are given ? This, in view of the difficulty of 
finding tests to add to a hierarchy, is admittedly not a 
question likely to trouble experimenters, but its theoretical 
implications are considerable. 

It can be shown that whenever we have a matrix of 
correlations based upon the same number of tests as 
persons, its determinant is zero. Now the determinant of 
a hierarchical matrix (with unity in each diagonal cell) 
can be shown to be of the form— 

U r E 73) 2 73) — 94"). - 

+ Ti a Clete EET (Lad gr) aes 

+ (1 — ry’) naa E (et Tays) eee 

+ (1 — ry) — ra?) dine (Obed) So 
+ (1 = Tyy?)(L = 7oy2)(1 — 15,2) Ta? 


and it is clear that each of these quantities is positive 
unless we have a case of pure g, or a Heywood case. A 
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case of pure g will leave one of the rows of the above sum 
non-zero. To make the whole sum zero, one case must be 
a Heywood case, giving— 


1 — ry? negative. 4 

It would seem, therefore, that by the time we have 
added hierarchical tests to make them equal in number to 
the persons, we will necessarily have added a Heywood 
hierarchical case (of which there can be only one in a 
hierarchy). But we have agreed that the discovery of a 
Heywood case will cause us to abandon the hierarchy as 
a definition of g ! 

The case where the number of tests is increased to equal 
the number of persons may seem to the reader to be an 
academic case only. But the case of reducing the number 
of persons until they equal the number of tests is one which 
could easily be realized in practice, and presents equal 
theoretical difficulties. This draws attention to the 
dependence of any definition of factors on the sample of 
persons tested. If we have a perfect hierarchy of, say, 
50 tests, in a population of, say, 1,000 persons, a sample of 
fifty persons from the above thousand, if it gives hier- 
archical order, will give a Heywood case, and its g will be 
impossible. 

If the g corresponding to the original analysis on the 
thousand persons were anything real, such as a given 
quantity of mental energy available in each person, then 
it ought always to be possible, one might erroneously 
think, to find fifty persons and fifty tests to give a hierarchy, 
without a Heywood case. But that cannot be easily said. 
It is impossible, from the correlations alone, to distinguish 
a real g from one imitated by a fortuitous coincidence of 
specifies. Even if g were a reality, a sample of persons 
equal in number to the tests could not give a hierarchy 
without a Heywood case, and th 
fortuitous. 

Now the case of a test of pure g is on the border line of 
the Heywood cases. It is clear then that it will be suspect, 
as being probably only fortuitous, if the number of persons 
does not far exceed the number of tests. 


cir apparent g would be 
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7. Singly conforming tesits—There remains one other 
conceivable method of measuring g exactly,* by the use 
of certain tests which, when they are all present, destroy 
the higrarchy, although any one of them can enter the 
battery without marring it—‘ singly conforming ” tests 
(Thomson, 1934); and 1935a, 253-6). It will be shown 
in later chapters on factor estimation that the reason 
factors cannot be measured exactly, but have to be esti- 
mated only, is that they outnumber the tests. Every 
new test which conforms to a hierarchy adds a new specific 
(unless it is pure g), and thus continues the excess of factors 
over tests. It can occur, however, that the correlation of 
two tests with each other breaks a hierarchy, although 
either of them alone conforms otherwise. Such a case 
occurs in the Brown-Stephenson battery, for example, one 
of whose correlation coefficients has to be suppressed before 
the hierarchy is acceptable. 

In such a case, if the psychologist is prepared to accept 
either test as a member of the battery, the erring correlation 
coefficient must be due to these two tests sharing some 
portion of their specifics with one another. If, as may 
happen (apart from error which we are supposing absent), 
their intercorrelation shows that they have only one specific 
factor between them, and differ only in their saturations, 
then they enable the estimate of g to be turned into accurate 
measurement. For example, consider the following matrix 
of correlations : 


1 2 3 4 5 6 
1 . -669 -592 458 385 +251 
2 -669 5 -566 438 -870 -240 
3 +592 566 . "887 283 ‘212 
4 458 438 “387 . +219 164 
5 +335 -870 283 219 . 120 
6 +251 +240 212 164 “120 č 


This is a perfect hierarchy except for the correlation— 
Tas = -870 


* By “exactly ” is meant, with the same exactness as the test 
scores, without the additional indeterminacy due to an excess of 


factors over tests. 
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Every tetrad-difference, which does not contain this 
correlation, is zero. If either Test 2 or Test 5 is removed 
from the battery, there remains a perfect hierarchy. If 
Test 5 is removed, we can calculate from the remaining 
battery the g saturations : 


Test 1 2 3 4 6 


g saturation -837 -800 ‘707 -548 300 


If we remove Test 2 and restore Test 5, we get the fol- 
lowing : 


Test 1 3 4 5 6 


g saturation 837 -707 548 -400 -300 


From cither hierarchy we can estimate g. The correla- 
tion of our estimates with “ true g ” will be— 


is 
S+1 


saturation? 
1 — saturation? 
and we find for the two hierarchies the g correlations of 
:92 and -90. 

From the two Tests 2 and 5 alone, however, we can ob- 
tain a g correlation of unity. 

The reason for this is that the correlation of Tests 2 
and 5 is such as to show that their specifics are identical, 


the two tests differing only in their loadings. Their 
equations are— 


where SES 


% = 8g + +/(1 — -8?)s, 

Be “4g 4/(1 aes 
If the whole of s, is identical with the whole of sẹ, their 
intercorrelation should be— 


‘8X -4-+ V(1 — -8:)\(1 — -42) = -870 
and this‘is its experimental value. 


We could, therefore, have seen at the beginning, if we 
had tested the above fact, that these two tests would make 
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a perfect battery for measuring g. We have the simul- 
taneous equations— 

Za = ‘8g + -6s 

a, = -4g + -917s 
from which we can eliminate s. 

We see, therefore, that under certain hypothetical 
circumstances, a more exact estimate of g can be obtained 
from two of these “singly conforming” tests than the 
hierarchy with which they conform individually. Those 
circumstances are, that their correlation with one another 
(the correlation which breaks the hierarchy because it is 
too large) should either equal— 

Tigh jg T va a Tig? (1 Fa Tig?) 
or should approach this value. 

It cannot in actual practice be expected to equal it, as 
in our artificial example. For we have disregarded errors, 
which are sure in some measure to be present. At what 
stage will the pair of singly conforming tests cease to be 
a better measure of g than the better of the two hierarchies 
made by deleting either the one or the other? If in our 
example the correlation -870 of Tests 2 and 5 be imagined 
to sink little by little, the correlation of their estimate 
with g will sink from unity. The better of the two hier- 
archies gives a multiple correlation of -922. When the 
correlation 72; has sunk from -870 to -847, these two singly 
conforming tests will give the same multiple correlation, 
+922. If this defect from the full -870 is due entirely to 
error, then a fall to -847 corresponds to reliabilities of the 
two tests of the order of magnitude of -98, if they are 
equally reliable. This is a very high reliability, seldom 
attained, so that in a case like our example quite a small 
admixture of error would make the singly conforming 
tests no better at estimating g than the hierarchy. We 
are here, however, neglecting the fact that error would also 
diminish the efficiency of the hierarchy. Nevertheless, the 
chance of finding a pair of singly conforming tests, highly 
reliable, and having no specifics except that whigh they 
share, seems small, as small as the chance of finding a test 
of pure g, perhaps. It might possibly turn out, however, 
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that a matrix of several (say t) singly conforming tests 
would be practicable. Such a set would measure g exactly 
if among them they added only ¢ — 1 new specifics to the 
hierarchy. Their saturations would be found by placing 
them one at a time in the hierarchy, and then their regres- 
sion on g calculated by Aitken’s method (see Chapter XIV). 
The necessity for the hierarchy in the background, in all 
this, is clear: it is there to assure us that each singly con- 
forming test is compatible with the definition of g, and to 
enable its g saturation to be calculated. 

8. The danger of “ reifying ” factors—The orthodox view 
of psychologists trained in the Spearman school is that g is, 
of all the factors of the mind, the most ubiquitous. “ All 
abilities involve more or less g,” Spearman said, although 
in some the other factors are “ so preponderant that, for 
most purposes, the g factor can be neglected.” With 
this view, the present author has always agreed, provided 
that g is interpreted as a mathematical entity only, ‘and 
judgment is suspended as to whether it is anything more 
than that. 

The suggestion, however, that g is “ mental energy,” of 
which there is only a limited amount available, but avail- 
able in any direction, and that the other factors are the 
neural machines, is one to be considered with caution. 
The word energy has a definite physical meaning. “ Mental 
energy ” may convey the meaning that the energy spoken 
of is the same as physical energy, though devoted to mental 
uses. If that meaning is accepted, innumerable difficulties 
follow, not the least being the insoluble questions of the 
connexion of body and mind, and of freewill versus 
determinism. A less obscure difficulty is that there seems 
to be no easily conceivable way in which the “ energy ” 
of the whole brain can be used in any direction indifferently, 
except by the “ neural engines ” also all taking part. The 
energy of a neurone seems to reside in it, and the passage 
of a nerve impulse along a neurone seems to resemble 
rather the burning of a very rapid fuse, than the conduction 
of electricity, say, by a wire. 

If “ mental energy ” does not mean physical energy at 
all, but is only a term coined by analogy to indicate that 
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the mental phenomena take place “as if” there were such 
a thing as mental energy, these objections largely disappear. 
Even in physical or biological science, the things which are 
discussed and which appear to have a very real existence 
to the Scientist, such as “ energy,” “ electron,” “ neutron,” 
“ gene,” are recognized by the really capable experimenter 
as being only manners of speech, easy ways of putting into 
comparatively concrete terms what are really very abstract 
ideas. With the bulk of those studying science there exists 
always the danger that this may be taken too literally, but 
this danger does not justify us in ceasing to use such terms. 
In the same way, if terms like “ mental energy ” prove to 
be useful, and can be kept in their proper place, they may 
be justified by their utility. The danger of “ reifying ” 
such terms, or such factors as g, v, etc., is, however, very 
great. 


PART II 
MULTIPLE-FACTOR ANALYSIS 
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CHAPTER V 
THE CENTROID METHOD 


1. Need of group factors.—The two-factor method of 
analysis, described in an earlier chapter, began with the idea 
that a matrix of correlations would ordinarily show perfect 
hierarchical order if care was taken to avoid tests which 
were “unduly similar,” i.e. very similar indeed to one 
another. If such were found coexisting in the team of 
tests, the team had to be “ purified ” by the rejection of 
one or other of the two. Later it became clear that this 
process involves the experimenter in great difficulty, for it 
subjects him to the temptation to discover “ undue simi- 
larity ” between tests after he has found that their correla- 
tion breaks the hierarchy. Moreover, whole groups of 
tests were found to fail to conform; and so group factors 
were admitted, though always, by the experimenter trained 
in that school, with reluctance and in as small a number as 
possible. It had, however, become quite clear that the 
Theory of Two Factors in its original form had been super- 
seded by a theory of many factors, although the method 
of two factors remained as an analytical device for 
indicating their presence and for isolating them in com- 
parative purity. 

Under these circumstances it is not surprising that some 
workers turned their attention to the possibility of a method 
of multiple-factor analysis, by which any matrix of test 
correlations could be analysed direct into its factors 
(Garnett, 1919a and b). It was Professor Thurstone of 
Chicago who saw that one solution to this problem could 
be reached by a generalization of Spearman’s idea of zero 
tetrad-differences. 

2. Rank of a matrix and number of factors.—We saw that 
when all the tetrad differences are zero, the correlations 
can all be explained by one general factor, a tetrad being 

63 
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formed of the intercorrelations of two tests with two other 
tests, thus : 


3 4 
1 Tig Tha 
2 T23 Tea 


and the tetrad-difference being— 


Ty3124 — Togig 


Thurstone’s idea, though rather differently expressed by 
him, can be based on a second, third, fourth . . . calcu- 
lation of certain tetrad-differences of tetrad-differences. 

To explain this, let us consider the correlation co- 
efficients which three tests make with three others : 


4 5 6 
1 Tig Tis Tis 
2 24 T25 T26 
£ 9 
3 T34 T35 T36 


This arrangement of nine correlation coefficients might 
have been called a “ nonad,” by analogy with the tetrad. 
Actually, by mathematicians, it is called a “ minor deter- 
minant of order three” or more briefly a three-rowed 
minor ; a tetrad is in this nomenclature a “ minor of order 
two.” 

We can now, on the above three-rowed determinant, 
perform the. following calculation. Choose the top left 
coefficient as “ pivot,” and calculate the four tetrad- 
differences of which it forms part, namely : 


(T4725 — Toa 15) ("14725 — T4716) 
(riss — Ysa? 15) (T1436 — 347 16) 

These four tetrad-differences now themselves form a 
tetrad which can be evaluated, If it is zero, we say that 
the three-rowed determinant with which we started 
“ vanishes.” 

Exactly the same repeated process can be carried on with 


larger minor determinants, For example, the minor of 
order four here shown vanishes : 


THE CENTROID METHOD 65 


(-26) -32 -38 Bd. 

42 36 -62 72 

"44 -62 -66 -46 

v45 58 63 -60 
for its pivotal (— -0408) -0016 0444 
t.d.’s are 0204 0044 —-0300 
-0068 — 0072 0080 

and then (— -00021216) -00031824 


:00028288 — :00042482 


and finally - Zero 


This process of continually calculating tetrads is called 
“pivotal condensation.” The reader should be given a 
word*of warning here, that the end-result of this form of 
calculation, if not zero, has to be divided by the product of 
certain powers of the pivots, to give the value of the deter- 
minant we began with. A routine method (Aitken, 1937a) 
of carrying out pivotal condensation, including division 
by the pivot at each step, is described in Chapter XIV, 
pages 201ff.* 

We can in this way examine the minors of orders two, 
three, four (and so on) of a correlation matrix, always 
avoiding those diagonal cells which correspond to the 
correlation of a test with itself. We may come to a point 
at which all the minors of that order vanish. Suppose these 
minors which all vanish are the minors of order five. We 
then say that the “rank ” of the correlation matrix is four 
(with the exception of the diagonal cells). There then 
exists the possibility that the “ rank ” of the whole corre- 
lation matrix can be reduced to four by inserting suitable 
quantities in the diagonal cells (see next section). The 
“ rank ” of a matrix is the order of its largest} non-vanish- 


* Tf the process gives, at an earlier stage than the end, a matrix 
entirely composed of zeros, the rank of the original determinant is 
correspondingly less, being equal to the number of condensations 


needed to give zeros. 
+ “ Largest ” refers to the number of rows, not to the numerical 


value. 
F.A.—3 


66 THE FACTORIAL ANALYSIS OF HUMAN ABILITY 


ing minor. The tests can then be analysed into as many 
common factors as the above reduced rank of their corre- 
lation matrix—the rank, that is to say, apart from the diag- 
onal cells—plus a specific in each test. 

3. Thurstone’s method used on a hierarchy.—Thurstone’s 
rule about the rank includes Spearman’s hierarchy as a 
special case, for in a hierarchy the tetrads—that is, the 
-minors of order two—vyanish. The rank is therefore one, 
and a hierarchical set of tests can be analysed into one 
common factor plus a specific in each. A simple way of 
introducing the reader to Thurstone’s hypothesis and also 
to his “ centroid ” method* of finding a set of factor satura- 
tions will be to use it first of all on the perfect Spearman 


hierarchy which we cited as an artificial example in our 
first chapter. 


te 

Tests 1 2 BY 4 Or 6 
l a 72 63 "54 45 36 
2 72 à -56 48 40 +82. 
3 63. -56 ; -42 -B5 28 
4 "54 “48 42 D "30 24 
5 45 -40 -35 -30 . +20 
6 36 32 28 24 "20 


The first step in Thurstone’s method, after the rank has 
been found, is to place in the blank diagonal cells numbers 
which will cause these cells also to partake of the same rank 
as the rest of the matrix, numbers which, for a reason which 
will become clear later, are called “ communalities.” In 
our present Spearman example that rank is one, i.e. the 
tetrads vanish. The communalities, therefore, must be 
such numbers as will make also those tetrads vanish which 
include a diagonal cell : this enables them to be calculated. 
Let us, for example, fix our attention on the communality 
of the first test, which we will designate h,? (the reason for 
the “square” will become apparent later). Then the 
tetrad formed by Tests 1 and 2 with Tests 1 and 3 is: 


* We shall see why it is called the “ centroid ” method in the 
next chapter. 
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1 3 
1 | he 63 
2 | 72 -56 


and the tetrad-difference has to vanish. Therefore— 
-56h — -72 x -63 =0 
-. kh? = -81 
Similarly all the communalities can be calculated, and 
found to be— 


81 64- -49 36 25 16 


(The observant reader will notice that they are the squares 
of the “ saturations ” of our first chapter ; but let us con- 
tinue as though we had not noticed this.) 

The method of finding the saturations of each test with 
the first common factor is then to insert the communalities 
in the diagonal cells and add up the columns* of the 
matrix, thus: 


Original Correlation Matrix 


(-81) “72 63 Sk 45, 36 
72 (64) "56 "48 "40 "832 
63 56 (+49) 42 35 +28 
54 48 42 (36) 30 124 
AS 40 35 30 (-25) +20 
36 ‘82 "28 24 20 (-16) 


3:51 3:12 2-73 2-34 1-95 1-56 15:21 


The column totals are then themselves added together 
(15-21) and the square root taken (3-90). The “ satura- 
tions” of the first (and here the only) common factor 
are then the columnar totals divided by this square root, 
namely— 

B51 312 273 234 1:95 1:56 
390 390 390 390 390 3-90 
or 9 8 7 6 5 “4 


* This, the “ centroid ” method of finding a set of loadings, is not in 
any way bound up with Thurstone’s theorem about the rank and 


` the number of common factors. It can be used, for example, with 


unity in each diagonal cell, in which case it will give as many common 
factors as there are tests, and no specific factors. 
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as in the present instance we already know them to be. 
(Very often in multiple-factor analysis the “ saturation ” 
of a test with a factor is called the “loading,” and this is 
a convenient place to introduce the new term.) € 

As applied to the hierarchical case, this method of 
finding the saturations or loadings had been devised and 
employed many years previously by Cyril Burt, though it 
is not quite clear how he would have filled in the blank 
diagonal cells (Burt, 1917, 53, footnote, and 1940, 448, 462). 
It should be explained that in actual practice Thurstone 
and his followers do not calculate the minor determinants 
to find the rank and the communality, for that would be 
too laborious. Instead they adopt various approximations, 
of which the simplest is to insert in each diagonal cell the 
hes correlation coefficient of the column (see Section 

4. The second stage of the “ centroid” method.—If there is 
more than one common factor, the process goes on to 
another stage. Even with our example we can show the 
beginning of this second stage, which consists in forming 
that matrix of correlations which the first factor alone 
would produce. This is done by writing the loadings 
along the two sides of a chequer board and filling every cell 
o he chequer board with the product of the loading of 

at row with the loading of that column, thus : 


First-factor Matrix 


| 9 8 7 6 5 “4 

9 | “81 712 -63 +54, 45 36 
8 | “72 64 -56 -48 -40 -32 
ii | 68 56 “49 -42 35 28 
5 ia = ioe :36 -30 -24 
“4 36 fe 85 -30 -25 20 
82 28 -24 -20 -16 


This is the “ first-factor matrix ” 
the correlations due to the first factor. This matrix has now 
to be subtracted from the original matrix to find the resi- 
dues which must be explained by further common factors. 

In our present example the first-factor matrix is identical 
with the original matrix and the residues are all zero. Only 


which gives the parts of 
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the one common factor is therefore required. (Of course, 
the reader will understand that in a real experimental 
matrix the residues can never be expected to be exactly 
zero :s one is content when they are near enough to zero to 
be due to chance experimental error.) Had the rank of 
our original matrix of correlations been, however, higher 
than one, there would have been a matrix of residues. 

Let us now make an artificial example with a larger 
number of common factors, say three, which we can after- 
wards use to illustrate the further stages of Thurstone’s 
method. We can do this in an illuminating manner by 
the aid of the oval diagrams described in Chapter I. 

5. A three-factor ecample.—In Figure 10, a diagram of the 
overlapping variances of four tests, let us insert three 
common factors and specifies to 
complete the variance of each test 
to 10 (to make our arithmetical 
work easy). No factor here is 
common to all the four tests. The 
factor with a variance of 4 runs 
through Tests 1, 2, and 3. That 
with a variance 3 runs through. 
Tests 2, 8, and 4. That with a 
variance 2 runs through Tests 1 
and 4. The other factors are ; 
specifics. The four test variances being each 10, the 
correlation coefficients are written down from the overlaps 
by inspection as : 


1 2 3 4 
Tiley eck ean ee 
2 “4 (7) 7 3 
3 4 “7 (7) 3 
4 2 3 3 (5) 


Moreover, we can put into our matrix the communalities 
corresponding to our diagram. Each communality is, in 
fact, that fraction of the variance of a test which is not 
specific. Thus -6 of the variance of Test 1 is “ communal,” 
‘4 being specific or “ selfish.” In this way we have the 
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matrix above, with communalities inserted. We can now 
pretend that it is an experimental matrix, ready for the 
application of Thurstone’s method, as follows : 


(6) 4 4 2 
4 (7) ore 3 Original 
4 YA (-7) 3 experimental 
2 3 3 (5) matrix. 

16 2-1 2-1 13 = 71 = 2:6646? 


lst Loadings | -6005 -7881 -7881 -4879 = 2-6646* 


6005 (3606) -4733 A733 +2930 

‘7881 4783 (+6211) -6211 "3845 First-factor 
“7881 A733 “6211 (6211) -3845 matrix. 
“4879 +2930 "8845 "8845 (-2380) 


Here it is seen that the loadings of the first factor, when 
cross-multiplied in a chequer board, give a first-factor 
matrix which is not identical with the original experimental 
matrix, unlike the case of the former, hierarchical, matrix. 
Here (as we who made the matrix know) one factor will 
not suffice. We subtract the first-factor matrix from the 
original experimental matrix to see how much of the 
correlations still has to be explained, and how much of the 
“communalities ” or communal variances. The latter 
were— 

6 OU “Mf: 5 
and of these amounts the first factor has explained— 


3606 “6211 “6211 +2380 


If we subtract the first-factor matrix, element by element, 
from the original experimental matrix, we get the residual 


matrix : 
(2394) —.0733  — -0733 aa 
z A (0789) 0789 — -0845 First residual 
x -0789 20780) ee a 
— -0930 — -0845 ( ) 0845 matrix 


— -0845 (-2620) 


; g This check should always be applied. To avoid complication 
it is not printed in the later tables. It applies to the loadings with 
their temporary signs (see page 72). 
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To this matrix we are now going to apply exactly the same 
procedure as we applied to the original experimental 
matrix, in order to find the loadings of the second factor. 
But we meet at once with a difficulty. The columns of the 
residual matrix add up exactly* to zero! This always 
happens, and is indeed a useful check on our arithmetical 
work up to this point, but it seems to stop our further 
progress. 

To get over this difficulty we change temporarily the signs 
of some of the tests in order to make a majority of the cells 
of each column of the matrix positive. The best plan is to 
change the sign of the test with most minuses in its column 
and row, and so on until there is a large majority of plus 
signs. Copy the signs on a separate paper, omitting the 
diagonal signs, which never change. Since some signs 
will change twice or thrice, use the convention that a 
plus surrounded by a ring means minus, and if then 
covered by an X means plus again. Near the end, watch 
the actual numbers, for the minus signs in a column may 
be very small. The object is to make the grand total 
a maximum, and thus take out maximum variance with 
each factor. We shall here, however, for simplicity adopt 
an easier rule, i.e. to seek out the column whose total 
regardless of signs is the largest, and then temporarily change 
the signs of variables so as to make all the signs in that 
column positive. 

The sums of the residual columns, regardless of sign, are— 


+4790 +3156 +3156 5240 


and therefore we must change the signs of tests so as to 
make all the signs in Column 4 positive ; that is, we must 
change the signs of the first three tests.} Since we change 
the three row signs, as well as the three column signs, this 
will leave a block of signs unchanged, but will make 
the last column and the last row all positive. We can 
then proceed as shown overleaf. 


* When enough decimals have been retained. In practice there 
may be a discrepancy in the last decimal place. z 

+ Changing the sign of Test 4 would here have the same result, 
but for uniformity of routine we stick to the letter of the rule. 
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2394 —-0733 — -0733 (—)-0930 
— :0733 -0789 0789 (—)-:0845 First residual 
— 0733 -0789 -0789 (—)-0845 matrix with 
(—):0930 (—)-0845 (—)-0845 -2620 changed signs. 
1858 "1690 -1690 -5240 = 1:0478 
= 1:0236? 
2nd 1815 1651 1651 -5119 With temporary 
Loadings i signs. 
“1815 -0329 "0300 -0300 0929 
1651 20300 -0273 0278 -0845 Second-factor 
1651 +0300 0273 0273 ‘0845 matrix. 
5119 | -0929 -0845 -0845 -2620 
2065 — "1033 — -1033 “0001 
— 1033 +0516 -0516 A Second residual 
— 1033 "0516 -0516 5 matrix. 
“0001 . . 


On the matrix with these temporarily changed signs we 
then operate exactly as we did on the original experimental 
matrix, and obtain second-factor loadings which (with 
temporary signs) are— 


“1815 -1651 1651 "5119 


The second-factor matrix, that is, the matrix showing 
how much correlation is due to the second factor, is then 
made on a chequer board still using the temporary signs, 
and subtracted from the previous matrix of residues (with 
its temporary signs, not with its first signs) to find the 
residues still remaining, to be explained by further factors. 
In the present instance we see that the whole variance of 
the fourth test entirely disappears, and also all the correla- 
tions in which that test is concerned.* This test, therefore, 
is fully explained by the two factors already extracted. 
Only the first three test variances remain unexhausted, 
and their correlations. Again the columns of the residual 
matrix sum exactly to zero. Following our rule, the signs 
of Tests 2 and 3 have to be temporarily changed before 
the process can continue. After these changes of sign the 


* When enough decimals are retained. We shall treat the 
-0001 as zero. 
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second residual matrix is as follows, and the same operation 
as before is again performed on it : 


3 "2065 (—)1033 (—)1033 . Second residual 
(—)1033 20516 0516 - matrix with signs 
(—):1033 -0516 -0516 . temporarily 
5 > changed. 
“4131 2 -2065 . = ‘8261 = -9089° 
8rd Loadings +4545 +2272 “2272 ` with temporary 
signs. 


With these third-factor loadings we can now calculate the 
variances and correlations due to the third factor : and we 
find these are exactly equal to the second residual matrix. 
On subtracting, the third residual matrix we obtain is 
entirely composed of zeros. (In a practical example we 
should be content if it was sufficiently small.) We thus 
find (as our construction of the artificial tests entitled us to 
expect) that the matrix of correlations can be completely 
explained by three common factors. 

After the analysis has been completed, some care is 
needed in returning from the temporary signs of the load- 
ings to the correct signs. The only safe plan is to write 
down first of all the loadings with their temporary signs 
as they came out in the analysis. In our present example 
these happen to be all positive, though that will not 
always occur. 


Loadings with Temporary Signs 
Test I II TI 


-6005 “1815 "4545 ` 
"7881 1651 -2272 
‘7881 “1651 2272 
-4879 “5119 


Sone 


Now, in obtaining Loadings II the signs of Tests 1, 2, and 
3 were changed. We must, therefore, in the above table 
reverse the signs of the loadings of these three “tests in 
Column II and each later column. Then in obtaining 
Loadings III the signs of Test 2 and 3 were changed ; that 


F.A.—3* 
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is, in our case changed back to positive. The loadings 
with their proper signs are therefore as shown in the first 
three columns of this table : 


Loadings of the Factors (Signs Replaced) 


Test a — << 
I II KIL “i Specific 
1 6005 — 1815 — 4545 | -6324 A 
2 7881 — -1651 + -2272 | 5477 i 
3 “881 —-1651 + -2272 | BATT 5 
4 4879 “5119 i i ‘7071 
=< | b- 


In this table each column of loadings, for the common 
factors after the first, adds up to zero. The loading of the 
specific is found from the fact that in each row the sum of 
the squares must be unity, being the whole variance of the 
test. The inner product * of each pair of rows gives the 


correlation between those two tests (Garnett, 19194). 
Thus— 7 


Tia = 6005 X -7881 + -1815 X 1651 — -4545 X 2272 = +4000 


in agreement with the entry in the original correlation 
matrix. With artificial data like the present, the analysis 
results in loadings which give the correlations back exactly. 

It will be seen that all the signs in any column of the 
table of loadings can be reversed without making any 
change in the inner products of the rows; that is, without 
altering the-correlations. We would usually prefer, there- 
fore, to reverse the signs of a column like our Column Ii, 
so as to make its largest member positive. 

The amount which each factor contributes to the variance 
of the test is indicated by the square of its loading in that 
test. The sum of the squares of the three common-factor 
loadings gives the “ communality ” which we originally 


* By the “inner product ” of two series of numbers is meant the 


sum of their products in pairs. Thus the inner product of the two 
sets : 


a b d 
and A B C D 
is aA + bB + cC + dD 
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deduced from Figure 10 and inserted in the diagonal cells of 
our original correlation matrix. These facts can be better 
seen if we make a table of the squares of the above loadings : 


Variance contributed by Each Factor 


Test | a 
T tA III | Communality Specific | Total 

| | Variance | 
1 +3606 0329 +2065 -6000 +4000 EGT 
2 -6211 -0273 -0516 -7000 +3000 | 1 
3 “6211 -0273 0516 -7000 +3000 | 1 
4 | +2380 +2620 5 -5000 -5000 | 1 

| 

Total | 1-8408 "3495 -8097 2:5000 | 1:5000 4 

| 


6. Comparison of the analysis with the diagram.—The 
reader has probably been turning from this calculation of 
the factor loadings back to the four-oval diagram with 
which we started (page 69), to detect any connexion; and 
has been disappointed to find none. The fact is that the 
analysis to which the Thurstone method has led us is, 
except that it too has three common factors, a different 
analysis from that which the original diagram naturally 
invites. That diagram gave for the variance due to each 
factor the following : 


Variance contributed by Each Factor 
Test | nis if | 
| p specific T 
T IL III | Communality Variance | ‘otal 
wld é 2 6 “4 } 1 
p A A 8 7 | 3 |1 
3 A 3 Č at | 3 | 1 
4 3 2 5 | -5 W asi 
| j 
Totals | 1-2 9 4 25 |e STB glint 


and the factor loadings are the positive square roots of 
these. 
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Loadings of the Factors 


Lest | = = 
| I U UI Specifics 
1 | -6325 A -4472 -6324 ; 
2 -6325 5477 d “ 5477 . 
3 6325 "5477 G z ‘BATT . 
4 DATT "4472 : : i 7071 


The only points in common between the two analyses are 
that they both have the same communalities (and therefore 
the same specific variances) and the same number of com- 
mon factors. The Thurstone analysis has two general 
factors (running through all four tests), while the diagram 
had none: and the Thurstone analysis has several negative 
loadings, while the diagram had none. We shall see later 
that Thurstone, after arriving at this first analysis, en- 
deavours to convert it into an analysis more like that of . 
our diagram, with no negative loadings and no completely 
general factors. This is one of the most difficult yet 
essential parts of his method. 

7. Analysis into two common factors—When we began 
our analysis of the matrix of correlations corresponding to 
Figure 10, we simply put the communalities suggested by 
that figure into the blank diagonal cells. That served to 
illustrate the fact that the Thurstone method of calculation 
will bring out as many factors as correspond to the com- 
munalities used, here three factors. But it disregarded 
(intentionally for the purpose of the above illustration) a 
cardinal point of Thurstone’s theory that we must seck 
for the communalities which make the rank of the matrix a 
minimum, and therefore the number of common factors a 
minimum. We simply accepted the communalities sug- 
gested by the diagram. Let us now repair our omission 
and see if there is not a possible analysis of these tests into 
fewer than three common factors. There is no hope of 
reducing the rank to one, for the original correlations give 
two of the three tetrads different from zero, and we may 
(in an artificial example) assume that there are no experi- 
mental or other errors. But there is nothing in the experi- 
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mental correlations to make it certain that rank 2 
cannot be attained. With only four tests (far too few, be 
it remembered, for an actual experiment) there is no minor 
of order three entirely composed of experimentally obtained 
correlations. It may then be the case that communalities 
can be found which reduce the rank to 2. Indeed, as we 
shall see presently, many sets of communalities will do so, 
of which one is shown here : 


(-26) 4 4 2 
4 (7) 7 38 
4 7 (7) 3 
2 3 3 (15) 


These communalities -26, -7, +7, and -15 make every 
three-rowed minor exactly zero. For example, the minor 


(-26) -4 2 
"4 (T) 3 
2 3 (15) 


becomes by “ pivotal condensation ” : 


026 0 
0 0 
and finally 0 


It must, therefore, be possible to make a four-oval 
diagram, showing only two common factors, and indeed 


Figure 11, 


more than one such diagram can be found. One is shown 
in Figure 11. 
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This gives exactly the same correlations. For example— 


242 ue, 
"23 = (20 x 20) 20 

12 12 
T34 =% 


/(20 x 80) 40 


It also gives the communalities -26, -7, -7, -15. For 
example, in Test 1, variance to the amount of 12 out of 
45 is communal, and 12/45 = -26. 

The insertion of these communalities, therefore, in the 
matrix of correlations ought to give a matrix which only 
two applications of Thurstone’s calculation should com- 
pletely exhaust. The reader is advised to carry out the 
calculation as an exercise. He will find for the first-factor 
loadings— 

-5000 -8290 +8290 -8750 


and if in the first residual matrix, following our rule, he 
changes temporarily the signs of Tests 2 and 3, the second- 
factor loadings will be— 


*1291 —.:1128 — „1128 «0968 


The second residual matrix will be found to be exactly 
zero in each of its sixteen cells. The variance (square of 
the loading) contributed by each factor to each test is then 
in this analysis : 


Variance contributed by Each Factor 
Test 
I I Communality | Specific | Total 
Variance 
il +2500 ‘0167 +2667 "7833 1 
2 6873 -0127 “7000 *3000 1 
3 6873 0127 | “7000 "8000 1 
4 "1406 "0094 *1500 -8500 1 
2 | 
Totals | 1-7652 -0515 1:8167 2:1833 4 


If we now compare these analyses, we see that the three 
common factors of the previous analysis “ took out,” as 
the factorial worker says, a variance of 2-5 of the total 4, 
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leaving 1-5 for the specifics. The present analysis leaves 
2:1833 for the specifies, which here form a larger part of 
the four tests. 

8. Alevander’s rotation.—We saw in Section 6 that the 
Thurstone method there led to an analysis which was 
different from the analysis corresponding to the diagram 
with which we began. That is also the case with the 
present analysis into two common factors—the very fact 
that it gives the second factor two negative loadings shows 
this, for the diagram (Figure 11) corresponds to positive 
loadings only. We said, too, in Section 6 that a difficult 
part of Thurstone’s method was the conversion of the 
loadings into new and equivalent loadings which are all 
positive. This will form the subject of a later and more 
technical chapter ; but a simple illustration of one method 
of conversion (or “ rotation ” as it is called, for a reason 
which will become clear later) can be given from our present 
example. It is a method which can be used only if we have 
reason to think that one of our tests contains only one 
common factor (Alexander, 1935, 144). Let us suppose in 
our present case that from other sources we know this fact 
about Test 1. The centroid analysis has given us the 
loadings shown in the first two columns of this table : 


Unrotated Rotated Rotated 
Test Loadings Communality Loadings Loadings 
I II 1 TS P bad T E hda 
1 "5000 1291 +2667 +5164 . ‘4781 1952 
2 ‘8290 — -1128 «7000 -7746 -3162 | -8367 i 
3 +8290 — 1128 -7000 "7746 -3162 | -8367 . 
4 +3750 -0968 +1500 +3873 . +3586 1464 


The communalities are also shown ; they are the sums of 
the squares of the loadings. If now we know or decide to 
assume that Test 1 has really only one common factor, and 
if we want to preserve the communalities shown, then the 
loading of factor I* in Test 1 must be the square root of 
+2667, namely -5164. 

The loadings of factor I* in the other three tests can 
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now be found from the fact that they must give the corre- 
lations of those tests with Test 1, since Test 1 has no 
second factor to contribute. The loadings shown in 
column I* are found in this way: for example, -7746 is 
the quotient of -5164 divided into Ty. (-4), and +3873 is 
similarly 74, (-2) divided by -5164. 

The contributions of factor I* to the communalities are 
obtained by squaring these loadings. In Test 1, we 
already know that factor I* exhausts the communality, for 
that is how we found its loading. We discover that in 
Test 4, factor I* likewise exhausts the communality, for 
the square of -3873 is -1500. The other two tests, however, 
have each an amount of communality remaining equal to 
‘1000 (i.e. -7000 — -77462). The square root of 1000, 
therefore (:3162), must be the loading of factor II* in 
Tests 2 and 3. The double column of loadings ought now 
to give all the correlations of the original correlation 
matrix, and we find that it does so. Thus, e.g.— 


Tag = 7746 X “7746 + 3162 X -3162 = -7000 
and ra, = :7746 X -3873 = -3000 


Moreover, the analysis into factors I* and II* corre- 
sponds exactly to Figure 11. For example, the loading of 
factor II* in Test 2 in that diagram is the square root of 
2/20 (-3162) ; and the loading of factor I* in Test 4 is the 
Square root of 12/80 (3873). 

If, however, the experimenter 
had reasons for thinking that Test 
2 (not Test 1) was free from the 
second common factor, his “ rota- 
tion ” of the loadings would have 
given a different result, shown in 
the table on page 79 in column I** 
and II**, This set of loadings 
also gives the correct commu- 
Figure 12, nalities and the experimental corre- 
i . lations, but does not correspond 
to Figure i, A diagram can, however, be constructed to 
agree with it (Figure 12) and the reader is advised to check 
the agreement by calculating from the diagram the load- 
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ings of each factor, the communalities of each test, and the 
correlations. 

We have had, in Figures 10, 11, and 12, three different 
analyses of the same matrix of correlations. If with 
Thurstone we decide that analyses must always use the 
minimal number of common factors, we will reject Figure 10. 
Between Figures 11 and 12, however, this principle makes 
no choice. Much of the later and more technical part of 
Thurstone’s method is taken up with his endeavours to 
lay down conditions which will make the analysis unique. 

9, Unique communalities—The first requirement for a 
unique analysis is that the set of communalities which gives 
the lowest rank should be unique, and this is not the case 
with a battery of only four tests and minimal rank 2, like 
our example. There are many different sets of com- 
munalities, all of which reduce the matrix of correlations 
of our four tests to rank 2. If, for example, we fix the 
first communality arbitrarily, say at -5, we can condense 
the determinant to one of order 3 by using -5 as a pivot 
(as on page 65) except that the diagonal of the smaller 


matrix will be blank : 


(-5) 4 4 2 
4 ; 7 3 
4 7 3 
2 3 3 

; 19 07 
19 7 -07 
07 07 


We can then fill the diagonal of the smaller matrix with 
numbers which will make each of its tetrads zero, namely— 


19 19 0258 
and then, working back to the original matrix, find the 


communalities— 
“5 Ave 7 -1316 


which make its rank exactly 2. We can similarly insert 
different numbers for the first communality and calculate 
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different sets of communalities, any one set of which will 
reduce the rank to 2. In this way we can go from 1:0 
down to 0-22951 for the first communality without obtain- 
ing inadmissible magnitudes for the others. Some sets 
are given in the following table * : 


1 2 3 4 Sum 
1-0 7 7 12963 252968 
5 7 7 13158 2:03158 
3 7 7 14 1:84 
+26 7 7 15 1:816 
-25 7 7 16 1-816 
124, 7 7 20 1-84 
+22951 7 7 1-0 262951 


If, however, we search for and find a fifth test to add to 
the four, which will still permit the rank to be reduced to 
2, this fifth test will fix the communalities at some point 
or other within the above range. Suppose that this test 
gave the correlations shown in the last row and column : 


1 2 3 4 5 
1 s 4 “A 2 5883 
2 4 “7 3 +2852 
3 4 7 i 3 "2852 
4 2 3 3 . "1480 
5 -5883 "2852 2852 1480 


If we now try to find communalities to reduce this 


matrix to rank 2 (as can be done), we find only the one 
set— 


tA cre 7 18030 5 


The reader can try this by assigning an arbitrary value for 
the first one,t and then condensing the matrix on the lines 


x The circumstance that the communalities of Tests 2 and 3 
remain fixed and alike is due to these tests being identical except for 
their specific. This lightens the arithmetic, but would not occur 
in practice. 

t Alternatively, the communalities (which are now unique) can 
be found’ by equating to zero those three-rowed minors which have 


only one element in common with the diagonal. In this connexion 
see Ledermann, 1987a. 
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employed above, when he will always find some obstacle 
in the way unless he chooses -7. Try, for example, -5 for 
the first communality : 


(5) “4. 4 2 -5883 
“4 : 7 3 +2852 
“4, 7 ; 3 -2852 
2 3 3 : 1480 
-5883 +2852 +2852 1480 

(æ) -19 07 — 09272 

19 t -07 — -09272 

07 -07 : — -04366 
— -09272 —-09272 — -04366 


Now, if the upper matrix is to be of rank 2, the 
second condensation must give only zeros (see footnote, 
page 65). But if we fix our attention on different tetrads 
in the lower matrix which contain the pivot 2, we see that 
they give, if they have to be zero, incompatible values for 
zx. Thus from one tetrad we get æ =-19, from another 
æ = -14866. With -5 as first communality, rank 2 
cannot be attained. With five tests (or more), if rank 2 
can be attained at all, it can be by only one unique set of. 
communalities. Just as it took three tests to enable the 
saturations with Spearman’s g to be calculated, so it takes 
five tests to enable communalities due to two common 
factors to be calculated. For larger numbers of common 
factors, the number of tests required to make the set of 
communalities unique is shown in the following table 
(Vectors, 77). The lower numbers* are given by the 
formula— 


5, 2r +1) + Vr +1) 


ne 2 
r Factors L 2 Pala oi 0ApaT8I9. SON EZ 
n Tests 8 E6 8 9 JO 12 18 14. 1b) Ayes 


* With six tests the communalities which reduce to rank 3 are 
not necessarily unique, for there are, or there may be, two sets of 
them. See Wilson and Worcester, 1939. 3 

I think the ambiguity, which is not practically important, only 
occurs when n is exactly equal to the formula, e.g. when r = 3, 6, 
10, ete. 
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If we were actually confronted with the matrix of correla- 
tions shown on page 69, and asked what the communalities 
were which reduced it to the lowest possible rank, we would 
find it very unsatisfactory to have to guess at random and 
try each set ; and our embarrassment would be still greater 
if there were more tests in the battery, as would actually be 
the case in practice. There would also be sampling error 
(which in this our preliminary description of Thurstone’s 
method we are assuming to be non-existent). Under these 
circumstances, devices for arriving rapidly at approximate 
values of the communalities are very desirable. 

10. Method of approximating to the communalities —Thur- 
stone has described many ways of estimating the com- 
munalities, and articles still issue from his laboratory on 
this subject. He points out, however, that if the number 
of tests is fairly large, an exact estimate is not very import- 
ant, and can in any case be improved by iteration, using 
the sums of squares of the loadings for a new estimate. 


munality the largest correlation coefficient in the column. 


We shall illustrate this, the easiest, method on the same 
example as we used above, for the sake of comparison and 
for ease in arithmetical computation, even although that 
example is really an exact and artificial one unclouded by 
sampling error. Inserting then the highest coefficients in 


(-5883) “4. 4 2 -5883 
= AG?) m 3 +2852 
4 i (7) 38 +2852 
+2 3 3 (-3) -1480 


"5883 -2852 -2852 1480 (-5883) 


c 


2:1766 2-3852 2-3852 1:2480 1-8950 = 10:0900 
3-1765? 


I 
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First à 
Loadings -6852 -7509 -7509 -3929 -5966 

The communalities which really give the minimum rank 

are, as we saw on page 82— 
yí drf ‘7 138038 -5 

and the correct first-factor loadings obtained by their use— 
‘7257 -7564 -7564 -3420 -5729 

With a large battery the difference between the loadings 
obtained by the ‘approximation and by the correct com- 
munalities would be much less. For the “ centroid ” method 
depends on the relative totals of the columns of the correla- 
tion matrix; and when there are twenty or more tests, 
these relative totals will not be seriously changed by the 
exact value given to the communality in the column. 
When the number of tests is large, the influence of the one 
communality in each column is swamped by the influence 
of the numerous correlations. 

The process now goes on as on page 71, and the residuals 
left after subtraction of the first-factor matrix check by 
summing in each column to zero, as there. 

Before, however, proceeding any farther, in this approxi- 
mate method we delete the quantities in the diagonal (the 
residues of the guessed communalities) and replace them by 
the largest coefficient in the column regardless of its sign, 
which we change to plus in the diagonal cell if it is negative 
in its own cell.* The reason for this is apparent, especially 
when, as may and does happen, the existing diagonal 
residues are negative, which is theoretically impossible. 
For although the guessing of the first communalities does 
not in a large battery make much difference to the first- 
factor loadings, it may make a big difference to the diagonal 
residues. If the battery is very large indeed, our first- 
factor loadings would come out much the same, even if we 
entered zero for every communality, but the diagonal 
residues would then all be negative. In short, the diagonal 
residues are much the least trustworthy part of the calcu- 


* It is necessary to keep an eye on the fact that what is inserted 
must not, with the squares of the previous loadings of that test, 


amount to more than unity. 
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lation when approximate communalities are used, and it is 
better to delete them at each stage and make a new 
approximation. 

11. Illustrated on the example.—To make this clearer, the 
whole approximate process is here set out for our small 
example as far as the second residual matrix. The ex- 
planations printed alongside the calculation will make 
each stage clear. It is important to form the residual 
matrices exactly as instructed, as otherwise the check of 
the columns summing to zero will not work. In practice, 
certainly if a calculating machine were being used, several 
of the matrices here printed for clearness would be omitted ; 
for example, with a machine one would go straight from 


A to C, while D and E would be made by actually altering 


C itself : 
(5883) -4 4 2 -5883 
4 (7) 7 3 "2852 | Largest r of 
A 4 7 (7) 3 +2852 | column inserted 
2 3 3 (8) 1480 | in diagonal cell. 
5883 -2852 2852 -1480 (5883) 
2:1766 2-3852 2-3852 1:2480 1-8950 — 10-0900 
= 3:1765? 
Loadings I| +6852 -7509 «7509 -3929 5966 = 3-1765 
"6852 | (4695) -5145 5145 \ -2692 “4088 
7509 | 5145 (-5639) -5689 -2950 4480 | __. 
By -7509 | +5145 -56891 (5689). 2950 -aago Virst-factor 
8929 | -2692 -2950 -2950 (1544) -2344 | Matrix, 
"5966 | -4088 -4480 -4480 -2844 (8559) 
(1188) —-1145 —1145 —-0692 -1795 
— 1145 (1361) -1861 -0050 —-1628 First residual 
G —'1145 -13861 (1861) -0050 —-1628 matrix. 
~ 0692-0050 0050-1456) —-0864|_4 _ B 
‘1795 —-1628 —-1628 —-0864 (2324) 
‘0001 —-0001 —0001 -0000 —-0001 Columns check 
to zero. 
(1795) —-1145 —-1145 —:0692 -1795 | Largest r of each 
— 1145 (-1628) -1361 -0050 —-1628 | column (regard- 
D — 1145 -1361 (1628) -0050 —-1698 | less of sign) in- 
—:0692 -0050 -0050 (0864) —-0864 | serted in each 
1795 —-1628 —-1698 —:0864 (-1795)| diagonal cell. 


THE CENTROID METHOD 87 


-6572 «5812 -5812 -2520 -7710 | Sum disregard- 
ing signs. 


(1795) 1145 1145 -0692 1795 | Signs of Tests 2, 
-1145 (1628) 1361 -0050 -1628 | 3, and 4 changed 


E|. «1145 -1361 (1628) -0050 -1628 |to make largest 


1795 -1628 -1628 -0864 (:1795)| all positive. 


-0692 -0050 -0050 (-0864) :0864|column (+7710) 


Algebraic 
Sum +6572 +5812 -5812 +2520 “7710 = 2:8426 
= 1:6860? 
LoadingsII| 3898 -3447 +3447 +1495 -4573 (With temporary 
signs.) 


+3898 (1519) 1344 1344-0583 1783 
"B4AT "1344 (1188) -1188 -0515 1576 | Second-factor 
F| -3447 1344 ‘1188 (1188) -0515 +1576 | matrix, using 


4573 “1783 “1576 ‘1576 -0683 (:2091)| 


1495 +0588 -0515 -0515 (-0124) -0683 | temporary signs. 


(0276) —-0199 —-0199 -0109 -0012 
—-0199 (-0440) -0173 —-0465 -0052 |Second residual 
G —-0199 +0173 = (0440) —-0465 -0052 | matrix. 
+0109 —-0465 —-0465 (-0640) -0180| E— F 
«0012 -0052 -0052 +0180 (—-0296) 


—-0001 —:0001 -0001 —:0001 -0000 | Columns check 
to zero. 


Notes.—It is fortuitous that all the entries in Æ are positive. 


Usually some will be negative. 
In the check for the residual matrices, a discrepancy from zero 
in the last figure is often to be expected, even of three or four units 


in a large matrix. 
Note the negative value occurring in a diagonal cell in G. 


Further stages would be carried on in the same way. 
But at each stage the residues will be examined to see if 
further analysis is worth while, by methods indicated later. 
Meanwhile let us assume in the present example that no 
more factors need be extracted. 

The matrix of loadings of common factors thus arrived 
at is, after we have replaced the proper signs in Load- 
ings II, shown at the top of the next page. e 

The communalities -6214, ete., are the sums of the 
squares of the two loadings. For comparison with the 
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Approwimate Method | True Values 


Test | TR kä F 
I | Il Communality | Communality 
| 
1 6852 | +3898 “6214 | “7000 
2 -7509 — BAT -6827 | “7000 
3 7509 — ‘3447 -6827 "7000 
4 +3929 — 1495 “1767 | 1303 
5 -5966 “4573 “5651 | +5000 
| 
| 
| 2:7286 | 


approximate communalities thus obtained there are shown 
the true values, which in this artificial case are known to 
us (see Section 9). This is for instructional purposes 
only—the comparison is not intended as any criticism of 
Thurstone’s method of approximation. As has been 
explained, this method is used only on large batteries, and 
it is a very severe test indeed to employ it on a battery of 
only five tests. 

12. Iteration of the process to improve the communalities — 
We might now go back and begin our whole calculation 
again, using the communalities 6214, ete., arrived at by 
the first approximation. This does not seem often to be 
done in practice, most workers being content with the 
approximation first arrived at. If we repeat the calcula- 
tion again and again with our present example, on each 
occasion using as communalities the sum of the squares of 
the loadings given by the preceding calculation, we get the 
following sets of closer and closer approximation to the 
true communalities* : 


| h? hg | hg? h? hs? 


First trial commu- 


| | 
| | 

nalities 5883 | -7000 | -7000 | -3000 | -5883 

Next approximation | -6214 | -6827 | 6827 | -1767 | -5651 
| $ | 


| 
| 


Next approximation 6881 -6970 


6970 1477 +5892 
Next approximation | -6535 “7043 7043, 1397 si 
True values “7000 +7000 7000 1303 


* In these repetitions we do not, as in the case of the first guess, 


alter the diagonal cells in each matrix of residues: we retain the 
diagonal residues without change. 
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The example has served to show how to work the 
iterative method of approximating to the communalities. 
Being an artificial example, and not overlaid with sampling 
error, it has had the advantage of allowing us to compare 
the approximations with the true values. But it must be 
remembered that a real experimental matrix is not likely 
to have an ewact low rank to which approximation can 
converge as here. In that case the approximations will 
presumably give an indication of the low rank which the 
matrix nearly has, which it might be made to have by small 
adjustments in its elements. 

It should be pointed out that iteration of each factor 
extraction separately will not give the same result. By 
iteration of the factors one by one we mean that after the 
loadings of the first factor are obtained they are squared 
and put into the diagonal cells as new communalities, and 
this is repeated again and again until the communalities 
remain unchanged. When this point is reached, the orig- 
inal matrix of correlations has been reduced as nearly to 
rank one as is possible. 

If the residues, after removal of the first factor, are then 
(after sign-changing) treated in the same way, they in 
turn will be reduced as nearly as possible to rank one. 
And so with successive residues, each matrix of residues 
being in succession reduced as nearly as possible to rank 
one by iteration of the one summation only. This process, 
although much easier than reiterating the whole process, 
and to that extent excusable, will not give the lowest pos- 
sible rank for the whole. Consider, for example, the 
correlations of the five tests used above on page 82. When 
communalities are reiterated with the first factor only, 
they settle down rapidly (the reader should check this) to— 


A571 -5421 -5421 +1261 +2729 
When the residues then left are taken, and a factor taken out 
and iterated, the communalities settle down to— 

“1677 -1003 -1003 0113 -1680 


The sum of these first-factor and second-factor sets is the 
set— 
6248 -6424 "6424 1374 -4409 
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These, however, if inserted in the diagonal cells of the 
original matrix, do not reduce it exactly to rank two, as 
can be done by the true communalities— 


“7000 *7000 ‘7000 “1303 -5000 


Iteration over two factors, as shown in the table on page 88, 
produces with four repetitions the approximations— 


-6585 "7043 "7043 "1397 +5253 


and (since in this artificial example rank two can be exactly 
reached) would ultimately converge to the above true 
values, though at the expense of much labour, for the 
convergenceisslow. Theiteration of each factor separately, 
however, would never converge to the true values. The 
above values (-6248, etc.) are final, and yet do not give 
rank two. 

18. Other methods of assessing the communalities.—The 
labour of finding the minimum communalities by iteration 
is so great that methods of improving the first guess are 
desirable. Medland (Pmka. 1947, 12, 101-10) has tried 
nine such methods on a correlation matrix with 68 vari- 
ables. A method entitled Centroid No. 1 method seemed 
to be best. A sub-group is chosen of from three to five 
tests which correlate most highly with the test whose 
communality is wanted. The highest correlation żin each 
column of the sub-group is inserted in the diagonal cell, 
and the columns summed. The grand total is also found. 
Then the estimate of h? is— 


(27, +4) 


Zr + Lt 


where the numerator is the square of the column total, 
and the denominator is the grand total. Thus if the cor- 
relations of the sub-group were— 


(72) -72 -63 -24 
“72 (-72) “AT -59 
-63 AT (-63) -41 
"24 “59 “41 (-59) 


2:31 2-50 2-14 1-83 = 8-78 
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the estimate of h? would be— 
2-31? 
8-78 
Clearly the same sub-group will usually serve for more than 
one of its members. Thus from the above example hj 
can be estimated to be +712. 

A graphical method, for which the reader is referred to 
Medland’s article, was about equally accurate but more 
laborious. Rosner (Pmka. 1948, 18, 181-4) gives an alge- 
braic solution for the communalities depending upon the 
Cayley-Hamilton theorem that any square matrix satisfies 
its own characteristic equation, but adds that the method 
“is not at all suited for practical purposes. The com- 
putational labour is prohibitive.” It is, however, interest- 
ing theoretically and may suggest new advances. 
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CHAPTER VI 
THE GEOMETRICAL PICTURE 


1. The scatter-diagram of two tests—A well-known way 
of representing correlation, and that used by Sir Francis 
Galton who devised correlation coefficients, is by a scatter- 
diagram. The scores in two tests are used as rectangular 
abscissee and ordinates, and each person represented by a 


TEST 2 


TEST | 


x72 
Figure 13. 


dot. Thus, if a person makes a score of X = 72ina Test 1 
and of Y = 59 ina Test 2, he is represented by the point P. 
The two tests are represented by the rectangular axes. 
If a large number of persons take the two tests, their points 
form the “‘scatter-diagram,” looking like a lot of shots at a 
target. The dots are most densely crowded together near 
a point whose ordinates are the average scores in the two 
tests. If there is no correlation between the two tests, 
and suitable units are used, the dots will thin out equally 
in all directions, forming a circular-shaped group. If, on 
the other hand, there is correlation, the group of dots will 
be elliptical in appearance, with an axis slanting-wise 
inclined to the test lines ; and more and more elliptical— 
92 
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the closer the resemblance of the scores, the higher, that 
is, the correlation. If we have first standardized the scores, 
the test lines will pass through the centre of the group, the 
average, and the axis of the ellipse will be equally inclined 
to both tests. In Figure 14 it is indicated how the ellip- 
tical group of dots narrows in the one direction, and 
lengthens in the other, with increasing correlation. The 
circle corresponds to zero correlation, the fat ellipse to 


TEST2 


TEST | 


7 
Figure 14. 


r = +5, the long thin one to r = -9. In perfect correlation 
all the dots would be on a line. In negative correlation 
the ellipse would be slanting the other way. These 
ellipses must not be looked upon as bounding the group of 
dots, which thins out to an indefinite distance. They are 
like contours of a hill, being, in fact, “contours” of the 
density of the dots. 

2. Three tests—When we have three tests we need 
three rectangular axes, like the three lines which meet in 
the corner of a room. A person’s three scores, measured 
along these lines, define a point in solid space, a point in 
the room. ‘The points thus representing a large number 
of persons will form a swarm in the room, congregated 
most thickly round the man who is average in all three 
tests, like a swarm of bees round the queen. If there is no 
correlation between any of the tests and suitable units are 
used, the swarm will be globular, but if there is corielation 
it will lengthen into an ellipsoidal shape like a Rugby 
football or a Zeppelin, though its waistline need not be 
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circular. In place of the ellipses of the two-dimensional 
figure, we now have ellipsoidal shells of equal density of the 
dots representing persons. One such is shown in Figure 
15, which the reader can imagine as being the room in 
which he is seated, the test lines, in their positive halves, 
being represented by the three edges of floor and walls 


TEST 2 


Figure 15. 


which meet in a corner, where the point representing the ^ 


average man is placed. The ellipsoidal swarm is then 
partly in the room, partly outside and below it. The part 
of the swarm in the room (in the positive octant, that is) is 
composed of persons scoring above the average in all three 


tests. The end of the major axis of the ellipsoid, that is, 
the longest line tha: 


ing. 4fthe-tests-have—all.. 
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floor of the room, will be a correlational ellipse due to the 
two tests edging that wall, or edging the floor. These 
three silhouettes will in general be different, depending on 
the adiposity, as it were, of the ellipsoid. 

When we have more than three tests we cannot make or 
easily imagine a similar model, for we know in real life 
only space of three dimensions. But mathematically we 
still can conceive of as many rectangular axes as there are 
tests, in,a “space” of more dimensions, of as many 
dimensions, indeed, as the number of tests. And we still 
speak of the “ ellipsoidal ” shape of the swarm of persons. 

3. The four quadrants.—Let us now return to the case 
of two tests. If the persons tested are numerous it will, 


with most tests, be found that the numbers in the two 
quadrants marked a are approximately equal (the axes 
being drawn, it is, understood, through the average score 
of each test) and, similarly, the numbers in the two quad- 
rants marked b in the figure. 

A portion a of the crowd of persons, that is, get scores 
above the average in each test, and an equal portion a are 
below the average in each. These people add to the 
correlation between the tests, whereas the others, in the 
b quadrants, are all good in one but bad in the other test 
and detract from the correlation. It can then be shown 
(Sheppard, Phil. Trans. Roy. Soc. 1898) that the correlation 
coefficient r is given by— t. 


ERRATA (Fiırru Eprrion) 

The last complete sentence on page 94 and the last 
sentence of section 5 on page 97 are incorrect and should 
be deleted. The major axis is not equally inclined, in 
general, to the orthogonal test lines. 
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r = cos 0 = cos S000 X 180° = cos 60° = 0-5 
Actual correlation tables will, of course, not show such 
complete equality in the opposite quadrants, and, more- 
over, the reader must beware of applying this formula 
unless the dividing lines are drawn through the means. 

4. Making the crowd circular. —We are next going to 
make a changè in our model by rotating the two test 
vectors, hitherto at right angles, towards one another until 
the angle between them is the above angle 0, whose cosine 
is the correlation coefficient. A person’s point P will still 
be located at, the point where the two perpendiculars from 
his scores meet. The rotation of the test lines towards one 
another, pivoted on the average man at the point where 
they cross, will, however, move the dots representing per- 
sons, and move them in such a way that the elliptical 
shape of the crowd disappears and it becomes circular. 
The presence of correlation is not now shown by the con- 
figuration of the crowd, but by the angle between the test 
lines. The cosine of this angle is the correlation coefficient. 


If we guide the eye by drawin. 
angles to each test line, we see that our former quadrants 
a and b are now represented by sectors of the circular 


crowd. Perpendiculars from any point in the white sectors 
a on to the test lines both fall on the same side of the 


g a dotted line at right 
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average: all persons situated in these sectors are either 
above the average in both tests (like P) or below in both. 
Anyone, on the other hand, whose point is in the shaded 
sector b is above the average in one of the tests and below 
in the other. Those in a add to the correlation, those in b 
diminish it. If correlation is perfect, the two test lines 
must be brought together until they coincide: and then 
the dotted lines will also coincide and the sector b will 
disappear. If, on the other hand, the correlation is low, 
the test lines will have to be farther apart, and the sector b 
will increase, until, when correlation is zero, the test lines 
are at right angles and the sectors a and-b are equal 
and balance one. another, the pros equal to the cons. 
For negative correlation the angle 0 between the test 
lines becomes obtuse, and the sectors b larger than the 
sectors a. 

5. Ellipsoid into sphere.—With three tests we saw that 
the solid “ scatter-diagram,” made with the test lines at 
right angles to one another, was ellipsoidal in form. Just 
as we converted the elliptical two-dimensional scatter- 
diagram into a circular crowd of dots by bringing the test 
lines closer together, until the cosine of the angle between 
them equalled the correlation coefficient, so with the ellip- 
soidal swarm of dots when we have three tests. If we take 
hold of the three test lines and swivel them nearer to each 
other, until the angle between each pair represents their cor- 
relation coefficient by its cosine, we then find that the ellip- 
soid has become a sphere. Moreover, we then find that the 
long major axis of the ellipsoid, which (with standardized 
tests) was equally inclined to the three rectangular test 
lines, is not now equally inclined to them, now that they 
have been brought into their new positions—unless, indeed, 
all three correlations are exactly equal, 

6. A wire model.—Let us suppose we want to make a 
wire model of this arrangement of three test lines, supposing 
that we have calculated by the usual product-moment 
formula the three correlation coefficients. Choosing any 
two of the tests, we find from a table of cosines what angle 
has a cosine equal to their correlation coefficient, and we 
lay two straight wires on the table crossing one another at 

BA—4 
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this angle, like an X. Imagine them soldered together at 
the point where they cross, which represents the man 
average in each test. 

Now consider the thiřd test, and look up the angles 
whose cosines equal its correlation coefficients with Tests 
land 2. The wire for this third test must be so placed as 
to make these angles with the first two wires—and we find 
that it will not lie flat on the table but sticks up at an angle, 
and its negative half has to go through the table and stick 
out below it. If we solder the three wires together where 
they cross (at the point representing the man who gets 
the average score in each of the three tests) and pick them 
up, they form a double tripod. 

7. Two kinds of space.—tIt will be seen that we have 

described two geometrical ways of representing correlation 
using two different spaces. In the or :k `q of space, the 
test lines are at right angles to one ai jn YEY e ox, segonal, 
and the presence of correlation is shatta P, KA eePthat 
the swarm of dots representing persotg or! Ce, 2er Uput 
ellipsoidal. ‘sted Cay fe} 
In the other kind of space, the ero a a ot, a thiiting 
persons is spherical and the prece of "correlation is 
shown by the test lines not being orthogonal but at angles 
with one another whose cosines equal the correlation 
coefficients. 

In both kinds of space, a person’s scores in the tests are 
found by dropping perpendiculars from his point on to the 
test lines. The distances of the feet of these perpendiculars 
from the origin—that is, from the point where the test lines 
cross—are his scores in the tests. 

If the test lines in this second kind of space are swivelled 
back into orthogonality, the person-points will move, will 
cease to be spherical in contour, and become ellipsoidal. 
All this is true, not only for three-dimensional space, when 
we have only three tests, but for multi-dimensional space 
needed to represent many tests and their inter-correlations. 
The algebra is exactly the same for any number of dimen- 
sions, and we continue, in the larger spaces, to use by 
analogy the terms we are accustomed to in real space, such 
as sphere, ellipsoid, etc. 
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8. A still larger space——Another way of arriving at the 
second of the above two kinds of space—the spherical one, 
in which the cosines equal the correlation coefficients—is 
to begin with a much larger space, of as many dimensions 
as there are persons, who are therein represented by 
orthogonal axes. If along each person’s axis we set off 
the score he gets in a given test, say Test 1, these abscisse 
will define a point in the space representing that test. In 
the same way each test can be represented by a point. It 
is a seatter-diagram with the usual rôles of tests and persons 
exchanged. 

These test points will usually be much less numerous than 
the persons, and they define a sub-space of dimensions 
equal to the number of tests. This sub-space, if the test 
scores have been normalized,* is the same as our spherical 
space, and the lines joining the origin to the test points are 
our former lines, separated by angles whose cosines equal 
the correlation coefficients. 

9. Factor axes —The problem of factorial analysis is to 
decide upon a set of axes to use in the space in which the 
test lines exist. Let us explain 
this first of all in the simplest case, 
that of two tests, represented by 
their lines in a plane, at the angle 
corresponding to their correlation. 

In this case, the most natural 
way of drawing orthogonal axes 
on the paper is to place one of 
them (see Figure 17) half-way 
between the test vectors, and the figures. 74 
other, of course, at right angles to r 
the first. Of these two factor axes, OA is as near as it can 
be to both test lines. 

We pictured, before, a swarm of ten thousand dots on 
the paper, each representing a person by his scores in the 
two tests, found by dropping perpendiculars from his dot 
to the two vectors. Instead of describing each point (each 
person, that is) by the two test scores, it is clear that we 
could describe it by the two factor scores—the feet of 

* See footnote, page 6. 
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perpendiculars on to the factor axes. It is also clear 
that, as far as this purpose goes, we might have taken 
our factor axes anywhere, and not necessarily in the posi- 
tions OA and OB, provided they went through the point O 
and were at right angles. In other words, we can “ rotate x 
OA and OB round the point O, and any position is equally 
good for describing the crowd of persons. Either of the 
tests, indeed, might be made one of the factors. The 
positions shown in Figure 17 are advantageous only if we 
want to use only one of our factors and discard the other, 
in which case obviously OA is the one to keep, as it lies 
as near as possible to both test axes. The scores along OA 
are the best possible single description of the two test 
results. 

10. Spearman axes for two tests The orthogonal axes 
chosen by Spearman for his factors are, however, none of 
the positions to which OA and OB can be rotated in the 
plane of the paper. Besides, Spearman has three factors, 
and therefore three axes, for two tests, namely the general 
factor and the two specific factors, and we cannot have 
three orthogonal axes or factor vectors on a sheet of paper. 
The Spearman factors must, for two tests, lie in three- 
dimensional space, like the three lines which meet in the 
corner of a room. If we rotate the OA and OB of Figure 17 
out of the plane of the paper (say, pushing A below the 
surface of the paper, and, say, raising B above it), we shall 
clearly have to add a third axis, at right angles to 04 and 
OB, to enable us to deseribe the tests and the persons who 
remain on the paper. There are now three axes to rotate ; 
and they must rotate rigidly, remaining at right angles to 
one another. The point at which Spearman stops the 
rotation, and decides that the lines then represent the 
“ best ” factors, is a position in which one of the axes is 
at right angles to Test X, and another is at right angles to 
Test Y. The third axis then represents g. 

11. Spearman aves for four tests —We are accustomed to 
depicting three dimensions on a flat sheet of paper, and 
so we can, in Figure 18, represent the Spearman axes g, S1» 
and s, for two tests. And since we have begun to depict 
other dimensions, by means of perspective, on a flat sheet, 
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let us continue the process and by a kind of super-per- 
spective imagine that the lines sj, S4, and any others we 
may care to add, represent axes sticking out into a fourth, 
a fifth, and higher dimensions. Figure 18 thus represents 
the five Spearman axes for four tests, of which only the 
line of the first test is shown (in its positive half only). 

All the five lines g, Sı, 52, 53, and s, must be imagined as 
being each at right angles to all 
the others in five-dimensional 
space. The line of Test 1, shown 
in the diagram, lies in the plane 
or wall edged by g and s. It 
forms acute angles with g and 
with s, the cosines of which 
angles are its saturations with g 
and s, respectively. If it had 
been highly saturated with g, it 
would have leaned nearer to g Figure 18. 
and farther away from sı. 

The other three axes, S» S3, and sp are all at right angles 
to the wall or plane in which Test 1 lies. They have, 
therefore, no correlation with Test 1, no share in its 
composition. Test line 2 similarly lies in the wall edged 
by g and s» test line 3 in that edged by g and sy. The 
axis g forms a common edge to all these planes. If the 
battery of tests is hierarchical—that is, if the tetrad- 
differences are all zero—then all the tests of the battery 
can be depicted in this way, each in its own plane at right 
angles to all the other planes, no test line being in the 
spaces between the “ walls.” s 

The four test lines themselves, of course, are only in 
a four-dimensional space (a 4-space we shall say, for 
brevity). Just as, when we were discussing Figure 17, we 
said that Spearman used three axes which were all out of 
the plane of the paper, so here in Figure 18, with four test 
lines (only one shown) in a 4-space, Spearman uses five 
axes in a space of one dimension higher than the number 
of tests. For n hierarchical tests, Spearman’s factors are 
in an (n + 1)-space. i 

If along each test line we measure the same distance 
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as a unit, then perpendiculars from these points* on to the 
g axis will give the saturations of the tests with g as fractions 
of this unit distance. The four dots on the g axis in Figure 
18 may thus be taken as representing the test vectors + 
projected on to the “ common-factor space,” which is here 
a line, a space of one dimension only. Thurstone’s system 
is like Spearman’s except that the common-factor space is 
of more dimensions, as many as there are common factors. 
Figure 19 shows the Thurstone axes for four tests whose 
matrix of correlation coefficients can be reduced to rank 2. 

12. A common-factor space of two dimensions.—Here there 
are two common factors, a and b, and four specifics, sy, 
Sg, S3, and sy. All the six axes representing these factors 
in the figure are to be imagined as existing in a 6-space, 
each at right angles to all the 
others. The common-factor 
space is here two-dimensional, 
the plane or wall edged by a 
and b—to make it stand out 
in the figure, a door and a 
window have been sketched 


upon it. 
In Spearman’s Figure 18, 
s; 2 each test line lay in a plane 
Figure 19, defined by g and one of the 


specific axes. Here in Figure 
19, each test line lies in a different 3-space. These different 
3-spaces have nothing in common with one another except 
the plane ab, the wall with the door and window in the 
diagram. In Figure 18 the projections of the unit test 
vectors on to the common-factor space were lines which all 
coincided in direction (though they were of different 
lengths), for there the common-factor space was a line. 
Here the common-factor space is a plane, 


and the pro- 
jections of the four test vectors on to that pl 


ane are shown 


* These points are then the sam 
described in Section 8 (page 99). 

7 A vector is a direction with a magnitude 
measured unit distance along each test line, 
test vectors. 


e as those arrived at by the process 


, and now that we have 
we may speak of unit 
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in the figure by the numbered lines on the “ wall.” These 
lines, if they are all projections of vectors of unit length, 
will by their lengths on the wall represent the square roots 
of the communalities. 

13. The common-factor space in general——When there are 
r common factors, the common-factor space is of r dimen- 
sions, and the whole factor space (including the specifics) is 
of (n + r) dimensions. The test vectors themselves areinan 
n-space ; their projections on to the common-factor space 
are crowded into an r-space, and are naturally at smaller 
angles with one another than the actual test vectors are.’ 
These angles between the projected test vectors do not, 
therefore, represent by their cosines the correlations be- 
tween the tests. The angles are too small for that, and 
the cosines, therefore, too large. But if we multiply the 
cosine of such an angle by the lengths of the two projections 
which it lies between, we again arrive at the correlation. 

Thus in Figure 19, the angle between the lines 1 and 3 
on the wall is less than the angle between the actual test 
vectors 1 and 3 out in the 6-space, of which the lines on 
the wall are the projections. But the lengths of the lines 1 
and 3 on the wall are less than the unit length we marked 
off on the actual vectors, being, in fact, the roots of the com- 
munalities. If we call these lengths on the wall h, and hg, 
then the product h,k, times the cosine of the projected 
angle again gives the correlation coefficient. 

14. Rotations.—It will be remembered that Thurstone, 
after obtaining a set of loadings for the common factors 
by his method of analysis of the matrix of correlations, 
“rotates ” the axes until the loadings are all positive— 
and he also likes to make as many of them as possible zero. 
It is instructive to look at this procedure in the light of our 
geometrical picture from which the phrase “ rotating the 
factors ” is taken. It should be emphasized first of all 
that such rotation of the common-factor axes in Thur- 
stone’s system must take place entirely within the com- 
mon-factor space, and the common-factor axes must not 
leave that space and encroach upon the specifics. In 
Figure 18, therefore, no rotation, in Thurstone’s sense, of 
the g axis can be made (since the common-factor space is a 
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line), except, indeed, reversing its direction and measuring 
stupidity instead of intelligence. i 

In Figure 19 the common-factor space is a plane, and 
the axes a and b can be rotated in this plane, like the hands 
of a clock fixed permanently at right angles to one another. 
When the positive directions of a and b enclose all the 
vector projections, as they do in our figure, then all the 
loadings are positive. The position shown would, there- 
fore, fulfil this desire of Thurstone’s. Moreover,- one of 
the loadings could be made zero, by rotating a and b until 
a coincides with line 1 (when b will have no loading in 
Test 1), or until b coincides with line 4 (when a will have 
no loading in Test 4). 

When there are three common factors, the common- 
factor space is an ordinary 8-space. The three common- 
factor axes divide this space into eight octants. Rotating 
them until all the loadings are positive means until all the 
projections of the test vectors are within the positive 
octant. This will always be nearly possible if the corre- 
lations are all positive. Moreover, it is clear that we can 
always make at any rate some loadings zero. In the 
common-factor 8-space we can move one of the axes until 
it is at right angles to two of the test projections, in which 
tests that factor will then have no loading. Keeping that 
axis fixed, we can then rotate the other two axes round it; 
seeking for a position where one of them is at right angles 
to some test. The number of zero loadings obtainable 
will clearly be limited unless the configuration of the test 
vectors happens to lend itself to many zeros. We shall see 
later that Thurstone seeks for teams of tests which do this. 

Although Thurstone makes his rotations exclusively 
within the common-factor space, keeping the specifics 
sacrosanct at their maximum variance, there is, of course, 
nothing to prevent anyone who does not hold his views 
from rotating the common-factor axes into a wider space, 


and increasing the number of common-factor axes at the 
expense of the specific variance,until ultimately we reach as 
many common factors as 


€ we have tests, and no specifics. 
15. The geometrical picture of centroid analysis —Think 
of a sheaf of lines representing a number of tests, with 
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angles corresponding to the correlations. Centroid analysis 
means (if unities are used in the diagonal cells) finding a 
line in the middle of this sheaf—at the centroid or resultant 
—something like the stick in the middle of the ribs of a 
slightly opened umbrella, except that our test lines are not 
regularly spaced like those ribs. 
All this is in a space of as 
many dimensions as there are 
tests, and it is not possible to 
make a drawing. But if the 
reader will be tolerant, we 
can make one of our “ super- 
perspective ” drawings show- 
ing a sheaf of test lines (see 
Figure 20) which must be 
imagined as being in a multi- 
dimensional space. The cen- 
troid line OC is the line along 
which the point O would move 
if each test line were a force 
—all equal—pulling O. It is 
exactly like the parallelogram 
of forees on a multi-dimen- 
sional scale. The dots on the 
test lines are at unit distance 
from O. (They have been 
joined by lines only in order 
to make the figure look more k 
solid.) The loadings of the Tgaro20: 
tests in the first centroid 
factor are the projections of these unit distances on to OC— 
this is when unities are used in the diagonal. cells. The 
summation process gives, arithmetically, these projected 
distances along OC. t N 
The next part of the arithmetical process consisted in 
removing that part of the correlation coefficients explained 
by the first factor loadings. This means, in our space 
diagram, that the dimension parallel to OC is abolished, 
and all the test lines are projected on to a space at right 
angles to OC and of one dimension less than the original, 


F.A. —4* 


106 THE FACTORIAL ANALYSIS OF HUMAN ABILITY 


(n — 1) dimensions instead of n, if n be the number of 
tests. : 

We have had perforce to draw our diagram as though 
it were in a three-fold instead of an n-fold space: and for 
this new (n — 1)-fold space we have drawn an ordinary 
plane, like a drawing-board, at right angles to OC, and 
projected the five test lines on to it. The next thing is to 
find the centroid of these five directed lines, these vectors, 
on the drawing-board. But we find at once that they are 
in equilibrium. If they were forces, the point O would 
not move. That is because OC is indeed the centroid of 
the original lines. This fact of equilibrium corresponds to 
the fact that the columns of residues add up to zero. 

To get over this, in the arithmetic, we changed the signs 
of some rows and corresponding columns, till, if possible, 
all cells were positive. (These cells of the residues are the 
cosines of the angles on the drawing-board, some of which 
are clearly obtuse, with negative cosines.) This reversal of 
signs in the arithmetic corresponds, in our diagram, to 
reversing some of the vectors on the drawing-board, till 
they again form a sheaf, as close as possible. Two are 
shown as reversed in our figure, and most of the angles are 
now acute, most of the cosines positive. It is desirable to 
make the sheaf as compact as possible, corresponding to 
making as many cells positive as possible. 

The centroid of the resulting sheaf of vectors (or forces) 
is the second factor. Its dimension is next abolished, 
by projection on to a space of (n — 2) dimensions, and so 
on, and soon. Our possibility of following this in a draw- 
ing is beyond delineation, but if the reader will in imagina- 


tion conceive of our first sheaf of test lines being in n 


dimensions, and being step by step projected on to spaces 
of (n — 1), (n — 2) 


i and lesser dimensions, he will have a 
picture correspondin 


1 g to the arithmetical summation pro- 
cess and the sign reversals in the residues. 


For simplicity we have above supposed that unities 
were being left in the diagonal cells, in which case as many 
common factors would emerge as there were tests, and 
there would be no specifies. If communalities are inserted 
and the rank of the matrix of correlations reduced, there 
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will be fewer common factors. Our diagram would then 
be in the common-factor space and, indeed, can still serve, 
if we suppose the distances from O to the dots on the test 
lines to be not unity, but the square roots of the commun- 
alities, and the angles to be the projections of those between 
the full test lines. With that change, our diagram would 
be one for the communal parts of five tests with three 
common factors, represented by OC, by the resultant of 
the vectors on the drawing-board (after the reversals to 
destroy the equilibrium), and by a third line also on the 
drawing-board, at right angles again. 

16. Principal components.—The object of using centroids 
as axes in the above process is to obtain axes in diminishing 
order of importance as describers of the test lines. In the 
current jargon, they each “ take out ” as much variance as 
possible at each step—or rather, not quite as much as 
possible, though nearly so. There is another set of lines 
which actually do take out as much as possible. They are 
the lines corresponding to the axes of the ellipsoid of 
Figure 15, or the more general ellipsoids of higher dimen- 
sions. The centroid OC in our Figure 20 is in such a 
position that the sum of the squares of the vertical distances 
of the test dots to it is very small, nearly as small as 
possible. Another line, however, quite close to OC and 
corresponding to the major axis of the ellipsoid, makes this 
sum of squares an absolute minimum, and the sum of 
squares of the loadings of the factor a maximum. 

In Section 5 above we spoke of converting the ellipsoid 
of our Figure 15 into a sphere by swivelling the three test 
lines nearer to each other till the cosines of their angles 
correspond to the correlation coefficients, and the test lines 
take up positions such as they have in our Figure 20. 
When this is done, the major axis of the ellipsoid takes up 
a position among the test lines, quite near to the centroid 
but not quite coinciding, and with the property of maxi- 
mizing the “ variance taken out.” Similarly, the other 
principal axes of the ellipsoid, when the change is made in 
the space, replace for the better the later centroids of the 
Simpler process. The arithmetical method of calculating 
their loadings is explained in our next chapter. 


CHAPTER VII 
PRINCIPAL COMPONENTS 


1. A historical accident.—By a historical accident, the 
method of principal components is associated in the minds 
of psychologists with analyses in which unities, and not 
communalities, are used in the diagonal cells of the square 
table of correlations. The centroid method can, however, 
equally well be used on such a table, giving the centroids of 
the complete test vectors in the whole test space: and the 
principal components of the communality vectors, in the 
common-factor space, can be found, using communalities in 
the diagonal cells, by the same iterative process as we are 
about to describe. As, however, this method was originally 
used on unit entries, we shall first make a principal com- 
ponents analysis of the whole tests of the example already 
used for the centroid process. Later we shall analyse the 
communality vectors by the same process (page 118). 

2. A calculation —The actual calculation of the loadings 
of principal components requires, for its complete under- 
standing, a grasp of the method of finding algebraically the 


1-0 4 4. 2 8 78 “175 
“4, 1-0 7 3 10 1:00 1-000 
4. 7 1-0 3 10 1:00 1-000 
2 3B 3 1-0 T 65 -637 
80 -32 32 16 
40 1-00 70 30 
40 70 100 30 
14 21 21 70 
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principal axes of an ellipsoid, a problem which will be 
found dealt with in three dimensions in any text-book on 
solid geometry. We give an account of this, for n dimen- 
sions, in the Appendix. Here we shall only explain 
Hotelling’s (1933) ingenious iterative method of doing this 
arithmetically, by means of an example, for which we shall 
use the matrix of correlations already employed in Chapter 
V to illustrate the centroid method (see page 108). 

Hotelling’s arithmetical process then begins with a guess 
at the proportionate loadings of the first principal com- 
ponent. Practically any guess will do—a bad guess will 
only make the arithmetic longer. We have guessed -8, 1, 
1, ‘7, the numbers to be seen on the right of the matrix, 
because these numbers are roughly proportional to the 
sums of the four columns, and such numbers usually give 
a good first guess. 

Each row of the matrix is then multiplied by the guessed 
number on its right, giving the matrix below the first one, 
beginning with -80. We then take, as our second guess, 
numbers proportional to the sums of the columns of this 
matrix,* namely— 

174 223 223 1-46 
giving ‘78 1 1 65 


That is, we divide the sums of the columns by their largest 
member, and use the results as new multipliers. They 
are seen placed farther on the right of the original matrix. 
Tt is unusual for two of them to be of the same size—that 
1s a peculiarity of our example. bis 
It is always the original matrix whose rows are multiplied 
by each improved set of multipliers. The above set gives 
the next matrix shown, that beginning with -780, and the 
Sums of tts columns— 
1-710 2-207 2-207 1:406 
give a third guess at the multipliers, namely— 
-775 1 1 -637 
* When a calculating machine is being used, this matrix will not 
be actually written down—the column sums will be arrived at on the 


Machine, 


t 
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And so the reiteration goes on, and the reader, who is 
advised to carry it a stage farther at least, would find if he 
persevered that the multipliers would change less and less. 
If he went on long enough, he would reach this point 
(usually, however, far fewer decimals are sufficient) 


1-0 "4 ode 2 “772865 
4 1:0 otf 3 1:000000 
ob org 1-0 3 1-000000 
"2 3 3 1-0 -629811 


“772865 "809146 -309146 154573 
-400000 1-000000 *700000 -300000 
-400000 -700000 1-000000 "800000 
"125962 "188943 188943 °629811 


1:698827 2-198089 2-198089 1-384384 
giving -772865 1 1 "629813 


that is, totals in exactly the same proportion as the multi- 
pliers. These final multipliers (or earlier ones if the experi- 
menter is content with less exact values) are then propor- 
tionate to the loadings of the first principal component in 
the four tests. They have, however, to be reduced until 
the sum of their squares equals the largest total, 2-198089, 
which is called the first “ latent root” of the original 
matrix. This is done by dividing them by the square root 
of the sum of their Squares and multiplying them by the 
square root of the latent root. They then become— 


-662 -857 857 -540 
The next ste 


p in Hotelling’s process is similar to one 
with which we 


have already become familiar in Thur- 
stone’s method. The parts of the variances and correla- 
tions due to this first component are calculated and sub- 
tracted from the original experimental matrix. These 
variances and correlations due to the first component 
are shown at the top of the opposite page. 

The residual matrix is then treated in exactly the same 
way as the original matrix, the beginnings of the process 
being shown opposite. There is no need, in this process, for 
sign-changing. The guessed multipliers, proportional to 
the sums of the columns, are not so near the truth this 


PRINCIPAL COMPONENTS 111 
-662 -857 -857 -540 


662 439 -567 567 357 
-857 | -567 734 734 -462 Matrix due to 
"857 -567 734 734 -462 first principal 
-540 "357 4.62 462 291 component, 
| 861 — ier —167 = 457 | 2 a8 
Residual} — -167 -266 — -084 — -162 | —-4 — 38 
matrix | — 167 — -034 -266 — -162 | —-4 —-38 
| 


— 157 — -162 — -162 “709 1:0 1-00 


"067 013 — -106 -065 
— 157 — -162 — 162 “709 


‘145 — -305 — +305 792 


time, for the first one, which we have guessed at :3, and 
which reduces after one operation to -18, goes on reducing 
until it becomes negative, the final values of these second 
loadings being as shown in the appropriate column of the 
following table, which also gives the loadings of the third 
and fourth factors, obtained in the same way. The vari- 
ances and correlations due to each factor in turn are 
subtracted from the preceding residual matrix and the new 
residual matrix analysed for the next factor : 


Sum of 


| | 
| | 
Menage | | y 
gacio | g | i | nii i Squares 
| |, 
| f | 
Test1 | -662218 | — -323824, -675967 : i ik 
» 2 1856836 135197 ‘812332 “387298; 1 
» 8 | 856836 | — -135197 | — -312332 -387298 | 1 
» 4 -589645 -826092 | "162323 . | 1 
Sum of | | 
squares * | 2-198090 -823526 678383-300000] 4 
5 |100 


Percentages! 55-0 20-6 16-9 75 


* These four quantities are, in the Hotelling process, what are 
called the “latent roots ” of the matrix. Their product gives the 
value, "8684, of the determinant of the matrix of correlation co- 
efficients, 
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An alternative method of finding principal components, 
due to Kelley, is to deal with the variables two at a time. 
The pair first chosen are rotated in their plane until they 
are uncorrelated. Then the same is done to another pair, 
and so on, the new uncorrelated variables being in turn 
paired with others, until finally all correlations are zero. 
(Kelley, 1935, Chapters I and VI.) A chief advantage is 
that the components are obtained pari passu, and not 
successively ; also, in certain circumstances where Hotel- 
ling’s process converges very slowly, Kelley’s is quicker. 
The end-results are the same. 

3. Acceleration by powering the matrix.—In a later paper 
Hotelling pointed out that his process of finding the load- 
ings of the principal components can be much expedited 
by analysing, not the matrix of correlations itself, but its 
square, or fourth, eighth, or sixteenth power, got by 
repeated squaring (Hotelling, 1935b). Squaring a sym- 
metrical matrix is a special case of matrix multiplication 
(see Chapter X, Section 4, page 145): it is done by finding 
the “ inner products ” (see footnote, page 74) of each pair of 
rows, including each row with itself, and setting the 


results down in order. Applying this to the correlation 
matrix ; 


1-0 +4 +h 2 
4 1:0 7 3 
“4 M 1-0 3) 
2 3 3B 1:0 


we see that the inner product of the first row with itself 
Is 1:36; of the first row with the second, 1-14; and so 


on. Setting these down in order, we get for the matrix 
squared : 


1:36 1-14 1-14. -64 

q 1-14 1-74 1-65 “89 
11400 165g, -89 

64 -89 -89 1-22 
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Exactly the same process is applied to this, beginning 
with guessed multipliers, as we applied to the original 
matrix. The multipliers, however, settle down twice as 
rapidly towards their final values, which are the same here 
as there. We have finally : 


1:36 1-14 114 -64 | 772865 
114 1-74 1-65 “89 | 1-000000 
1-14 1-65 1-74 “89 | 1-000000 

64 “89 “89 1-22 *629811 


1-140000 1:740000 1-650000 = «890000 
1-140000 1-650000 1-740000 +890000 
403079 560532 -560532 -768369 


| 
1-051096 881066 -881066 -494634 | 
| 


3-734175 4831598 4831598 3-043003 | 


Ratio "772865 1-000000 1-000000 -629812 


The “ latent root,” however, or largest total, 4831598, is 
the square of the former latent root, 2-198090, so that its 
square root must be taken before we complete finding the 
loadings. 

In exactly the same way the squared matrix may be 
again squared, and again and again, before we analyse it. 
The more we square it, the quicker the Hotelling iteration 
process works. The end multipliers are always the same, 
but the “ root ” is the same power of the root we need as 
is the matrix of the original matrix. 

A still further acceleration of the process is due to Cyril 
Burt, who observed that as the matrix is repeatedly 
squared it becomes more and more nearly hierarchical, 
including the diagonal cells (Burt, 1937a). This is due 
to the largest factor increasingly predominating as it is 
“powered,” especially if the largest latent root is widely 
Separated from the others. In consequence, the square 
roots of the diagonal cells become more and more nearly 
in the ratio of the Hotelling multipliers, and form an 
excellent first guess for the latter. When our matrix 
IS squared twice again, giving the eighth power, it 
becomes : 
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| 108-78 140-67 140-67 88-54 
| 140-67 182-03 182-03 114-61 
| 140-67 182-03 182-03 114-61 
| 88-54. 114-61 114-61 72:38 


and the square roots of its diagonal members are— 
10-429 13-492 13-492 8-508 
which are in the ratio— 
‘7730 1 ub -63806 
very near indeed to the Hotelling final multipliers— 
772865 1 1 “629811 


Hotelling gives a method of finding the residues, for the 
purpose of calculating the next factor loadings, from the 
“powered ” matrix. But it may be so nearly perfectly 
hierarchical that this fails unless an enormous number of 
decimals have been retained, and it is in practice best to 
go back to the original matrix and obtain the residues 
from it. Their matrix can in turn be squared, and so on. 
Other and very powerful methods of acceleration will 
be found described in Aitken, 1937. 

4. Properties of the loadings.—If all the principal com- 
ponents are calculated accurately, and if unities were used 
in the diagonal cells, their loadings ought completely to 
exhaust the variance of each test ; that is, the sum of the 
squares of the loadings in each row should be unity. ` The 
sum of the squares of the loadings in each column equals 
the “ latent root ” corresponding to that column, and the 
sum of the four latent roots is exactly equal to the number 
of tests. Each latent root represents the part of the whole 
variance of all the tests which has been “ taken out ” by 
that factor. Thus the first factor “takes out” 55 per 


cent., the first two factors together 75-6 per cent., of the 
variance of the original scores, 


The four factors account 
for all the variance. 


_ The correlations which correspond to the loadings given 
in the table on page 111 are obtained by finding the 
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“inner product ” of each pair of rows. Applying this to 
the table we find the correlation 7, say, to be— 
-856836 X -539645 — -185197 x -826092 — -312332 

X -162323 — -387298 x zero = ‘800000 

In this‘ way the loadings of the four principal com- 
ponents will exactly reproduce the correlations we began 
with. If, however, we have stopped the analysis after we 
have found only two principal components (or factors), 
these two would have reproduced the correlations only 
approximately. For example, for rə we should only 
have— 

856836 X :589645 — 135197 X -826092 
= -350702 instead of -300000 

Before we leave the table of loadings, we may note that 
the signs of any column of the loadings can be reversed 
without changing either the variances or the correlations. 
Reversing the signs in a column merely means that we 
measure that factor from the opposite end, as we might 
rank people either for intelligence or stupidity and get the 
same order, but reversed. We will usually desire to call 
that direction of a factor positive which most conforms 
with the positive direction of the tests themselves, and 
therefore we will usually make the largest loading in each 
column positive. 

All the loadings of the first principal factor are, in an 
ordinary set of fests, positive. Of the other loadings, 
about half are negative. 

5. Calculation of a man’s principal components. — 
Factors obtained by using unities, and not communalities, 
in the diagonal cells have an important advantage. They 
can be calculated exactly from a man’s scores, whereas 
communality factors can only be estimated. This is 
because the former are never more numerous than the tests, 
whereas the latter, including the specifics, are always more 
numerous than the tests. For the former, therefore, we 
always have just the same number of equations as un- 
knowns, whereas we have more unknowns than equations 
when communalities are used. i 

We have hitherto given the analysis of tests into factors 


116 THE FACTORIAL ANALYSIS OF HUMAN ABILITY 


in the form of tables of loadings. But we can alternatively 
write them out as “ specification equations,” as we shall 
call them. Thus the table on page 111 would_be written— 


"662218, — -8238247, + -6759677, 


% = “856836, — -135197y, — -312332y, — 387298), 
% = 8568367, — 185197), — -312882y, + -387298y, 
% = -539645y, + -826092y, + -162323,, 


Here z, Z2» 23, and z, stand for the scores in the four 
tests, measured in standard units ; that is, measured from 
the mean in units of standard deviation. The factors 
Yu Ye Yə» and y, are also supposed to be measured in such 
units. These specification equations enable us to calculate 
any man’s standard score in each test if we know his 
factors, and since there are just as many equations as 
factors, they can be solved for the y’s and enable us to 
calculate, conversely, any man’s factors if we know his 
scores in the tests. The solution to these Hotelling equa- 
tions for the y’s happens to be peculiarly simple, as we 
shall prove in the Appendix, Section 7. It is as follows— 


wy = ( *662218z, + 8568362, + 8568362, + *539645z,) + 2-198090 
y= (— *8233242, — *135197z, — 1351972, +- *8260922,) + 8283526 


Ys = ( 6759672, — "8123322, — *812382z, +- *1628232,) + 678383 


Ya = ( — *387298z. + 3872982, ) + +300000 


The table on page 111, therefore, serves a double purpose. 
Read horizontally it gives the composition of each test in 
terms of factors. Read vertically it gives the composition 
of each factor in terms of tests, if we divide the result by, 
the root at the foot of the column.* 


Suppose, for example, that a man or child has the fol- 
lowing scores in the four tests— 


1-29 "86 72 1-03 
This is evidently a person above the average in each test, 


since the scores are all positive. His factors will be 


* IF the analysis has been performed with “ reliabilities” in the 
diagonal cells instead of units, the statement in the text still holds 
(Hotelling, 1933, 498). If on correlations corrected for “ attenua- 
tion,” the matter is more complicated (ibid. 499-502). 
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obtained by substituting these scores for the xs in the 
above equations, with the result— 


Yı = 1-062504 


Yo = °349441 
Ya = 1-084624 
Ya = 464757 


(Of course, in practical work six decimal places would be 
absurd. They are given here because we are using this 
artificial example to illustrate theoretical points, in place 
of doing algebraic transformations, and they need, there- 
fore, to be exact.) 

If these values for the factors are now inserted in the 
specification equations opposite, the scores z in the test 
will be reproduced exactly (1-29, -36, 72, and 1-08). 

Notice, too, that if we have stopped our analysis at less 
than the full number of principal components using unities 
in the diagonal cells, we can nevertheless calculate these 
factors for any person exactly. As soon as we have the 
first column of the table on page 111, we can calculate y} for 
anyone whose scores z we know. 

Had we done this with the person whose scores are given 
above, we should have summarized his ability in these four 
tests by the one statement— 

Yı = 1-062504 

This would have been an incomplete statement, but it 
is the best single statement that can be arrived at. 

6. Principal components in the common-factor space.— 
Exactly the same iterative Hotelling process for finding the 
Principal components, the principal axes, of the ellipsoids 
of density of the person-points can be applied to the table 
of correlations with communalities in the diagonal cells. 
The ellipsoidal swarm of person-points, in the full test space 
with orthogonal axes for the tests, remains an ellipsoidal 
Swarm (though one of fewer dimensions) when projected 
on to the common-factor space. The mathematical reader 
will know this, or can work it out. The non-mathematical 
reader knows it well enough in the number of dimensions 
at Is Personally acquainted with: e.g. an egg, which is an 

'Psoid of three dimensions, throws a shadow on a wall 
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which is an ellipse, i.e. an ellipsoid of two dimensions. We 
shall now analyse the same set of correlation coefficients 
using the communalities -26, -7, -7, -15, which we know, 
from Chapter V, page 77, reduce the rank of the matrix 
to two, and give an analysis with only two common factors. 
We found on page 78 the two centroid common factors. 
We shall now find the two principal components and find 
them very similar. 
7. Calculation with communalities 


26070 4. ck G2 > | @7 =59 5 «9 “5918 
-4 dp UE S a E a eg 
“4, af SE Si ee ee Gy 
2 -3 BIS | 5 45,2485 


1:0867 1:83 1-83 -815 | 


Taking -7, 1, 1, -5 as a first guess at the multipliers, we find 
the weighted sums of the columns to be as shown, and on 
dividing through by 1-83 we get the next set of multipliers 
59, 1, 1, -45. Continuing in this way, we arrive quite 
soon at -5913, 1, 1, -4485, which, when used as weights, 
reproduce themselves. When reduced until the sum of 


their squares equals 1-7696 (the largest column total with 
these weights), the loadings are— 


“4929 -8336 -8336 -8697 


Subtracting the cross-products of these from the original 
matrix, and operating on the residues in exactly the same 
iterative way, we get for the second factor loadings— 


1540 — -0712 — 0712 ‘1153, and no residues.* 


If we compare these principal component loadings with the 
centroid loadings (page 78) obtained with the same com- 
munalities, we see that they are very similar. But the 
sum of squares of the loadings of the first principal com- 
ponent (1-7694) is slightly larger than the same sum for the 
first centroid loadings (1-7652). The principal compon- 


* The sums of squares of the loadings (1-7694 and -0471) are the 
two first latent roots of the matrix with communalities. The other 
two latent roots are zero. The sum of the latent roots equals the 
sum of the communalities, the “ trace ” of the matrix as it is called. 
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ents take out at each stage the maximum possible variance 
(sum of squares of loadings). The centroids nearly do 
so if the sign-changing is carefully done, but not quite. 
The centroids can best be looked at as approximations to the 
principal components, more easily calculated. In a bat- 
tery of many tests, say two dozen, and with any given 
communalities, the principal component process (“ weighted 
summation ”) will take out more variance in, say, six 
factors, and leave smaller residues, than will centroid 
factors.* But with the kind of data available in psychology, 
this advantage does not outweigh the disadvantage of 
longer calculation. 

8. Iterative methods—Both in the above Hotelling cal- 
culation, and in our discussion of communalities on page 
88, we have scen examples of iterative processes, where a 
first guess at certain constants gives results which can be 
used as a better guess, which gives results which can be 
used as a still better guess, which gives . . . and so on 
and so on, until the stage is reached where the same con- 
stants emerge as were put in. This sort of process, where 
repetition after repetition converges to a steady result 
giving some maximum or minimum value to some quantity, 
is not uncommon in mathematics and is rather mysterious 
and magical to the layman. An analogy will perhaps 
assist understanding. Robinson Crusoe wants to make a 
lathe, but he has no wheels and spindles, and to make 
wheels and spindles he needs a lathe! He can, however, 
whittle crude makeshift wooden wheels, etc., with a knife, 
and make a crude lathe with them, with which lathe he can 
make somewhat better wheels and therefore a somewhat 
better lathe, with which he can make still better wheels 
. . . and so on, till he reaches perfection. 


* If the Hotelling process is used with guessed communalities, and 
the whole is iterated (as was done with centroids on page 88) the 
communalities will converge to a set minimizing the sum of squares 
of the residuals for a given number of factors. The maximum likeli- 
hood method of Chapter IX arrives at communalities (I understand 
from Dr, Lawley) which minimize a weighted sum of squares of the 
residuals, each weight being the product of the reciprocals of the two 
Specific variances concerned. 


CHAPTER VIII 
TESTING RESIDUES FOR SIGNIFICANCE 


1. The object of factorial analysis —As was said in the first 
section of Chapter I, the objects of factorial analysis are 
both practical and theoretical. The practical desire is to 
reduce the description of a man’s mind* to a comparatively 
few quantitative statements, instead of an unwieldy record 
of innumerable test scores, with a view to giving vocational 
or educational advice. The hope, on the theoretical side, 
is that the “ factors *’ found may form the structure of a 
theory of mind: and there are some who hope that physio- 
logical or neurological bases may be found for them. Our 
concern in this chapter is with the first point: how to 
reduce the number of “ factors” without sacrificing any 
significant fraction of the information. The insertion of 
communalities in the diagonal cells of a table of correla- 
tions is by many looked upon as one way of doing this, 
since it reduces the number of common factors. Simul- 
taneously, however, it creates and maximizes the influence 
ascribed to specific factors, and the total number of factors 
is increased, not diminished. This will not be discussed 
in the present chapter, which is concerned with another 
way of reducing the number of factors, applicable whether 
communalities or full variances are analysed. If the idea 
of communalities and specifics had never occurred to any- 
one, it would still have been possible to reduce the number 
of significant common factors to a number less than the 
number of tests. Each principal component, found as 
described in Chapter VII, causes the remaining residues to 
be as small as can be: and the centroid factors of Chapter 
V are nearly as good, if the sign-changing is done properly. 
If, after a few such factors have been extracted, the 
residues are so small as to be statistically negligible, we 

* Or of other objects of study, say in agriculture or in engineering. 
See Chapter XII, Section 7. 
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might as well stop the analysis, content with the few factors 
extracted. We need, therefore, some test of statistical 
significance, applicable to such residual correlations, to 
know if they are negligible. 

2. The general idea of significance-—The general prin- 
ciple of such a test of significance is this, that if the residues 
we have found, or in practice some function of them, could 
only rarely have been produced by the action of chance 
sampling, we will assume that they are not due to sampling 
but to another factor. How we define “ rarely ” depends 
on circumstances. Usually in psychology “ once in twenty 
times ” (the 5 per cent. point as it is called) is rare enough 
to justify taking out another factor. The principle is 
Straightforward enough, the mathematical difficulty of 
finding formule for calculating the chances, however, very 
great, even for principal components with full variances, 
and insuperable when the centroid method is used with 
Suessed communalities. In consequence, a number: of 
rule-of-thumb criteria have been put forward, to decide 
when to stop factorizing. 

8. Empirical rules for the number of factors.—Thurstone 
(19384, 65 et seq.) discusses some of the earlier ones. A cri- 
terion which appeals to common sense is based simply on the 
*'gebraic sum of the residuals (excluding the diagonal cells) 
es as many as possible of their signs have been made 
[sitive by the process described in Chapter V (page 71). 
tin long as this sum goes on sinking, factorization 1s a 
Arad: When it flattens, the last factor taken ou ie 
t lected and the process stopped. Mosier (1939) ena 
x e best of five plans he tried, though none was who. y 

atisfactory, 
bee yard Tucker’s criterion is that the ratio of a aa 
ia © absolute values of the residuals, inclu ae wee 
S forall used, just after and just before the extractio m 

x ctor must be less than (n — 1)/(n + 1) where n 18 © 
mber of tests, 
da a criterion depends upon the nw 
One z t among the residuals after every 
Proce © reduce them by sign-changing, ™ 
Ss. If they are few, another factor may 


mber of negative 
thing has been 
the centroid 
be extracted. 


/ 
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More exactly, the permissible number is given in this 
table : 


Number of tests melo 15. 20, 9.25. —80 
Negative signs . . 31 79 149 242 358 
Standard error . eee De vee TO: LZ)" me’. 


A fuller table is given in Coombs’ article (1941). 

An example of the use of these two will be found in 
Blakey (1940, 126). 

Quinn MeNemar (1942), who considers both of the 
above inadequate, gives a formula which includes N the 
size of the sample. He takes out factors until c, reaches 
or falls below 1/VN, where 


`o =o, + (1 — My), 
co, = st. dev. of the residuals after s factors, 
M, = mean communality for s factors. 


Others go on until the distribution of the residuals 
ceases to be significantly skew (Swineford, 1941, 378). 
Reyburn and Taylor (1939) divide the residuals by the 
probable errors of the original coefficients, and plot a 
distribution of the results disregarding signs. If it is 
significantly different from a normal curve of the same area 
and with standard deviation 1-4825, they take out more 
factors. Swineford (1941, 877) finds the correlation 
between the original correlations and the corresponding 
residuals and takes out factors till it is not significant. 

Another method is based on the sinking of the factor 
loadings with each successive factor instead of on the dying 
away of the residuals. Guilford and Lacey (1947 in a 
U.S. Air Force report) stop factorizing when the product 
of the two highest factor-loadings falls below 1/ VN. 

P. E. Vernon, in a privately circulated manuscript, has 
tested some two dozen methods, as applied when the 
centroid or simple summation method of analysis is used 
with communalities, on two analyses of actual data, On 
645 and 994 cases respectively (Vernon, 1947). His final 
advice is to use the methods of Guilford and Lacey (pr0- 
duct of the two highest factor loadings) and of Mosier 
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(sum of the residuals), together with Burt’s empirical 
formula for the standard error of each factor loading— 


(Q — e)n 
VN(n — s +1) 
where Z = loading, N = number of persons, n = number of 
tests, s = the ordinal number of the factor. If half the 
loadings of a factor fall below twice their standard errors 
thus found, Vernon recommends rejection of the factor. 

If these three methods do not agree, Vernon would 
Proceed to calculate McNemar’s c, (opposite), and would 
decide on the evidence of the four criteria, taking out another 
factor if doubtful. 

4. More exact methods—The earliest method was to 
compare each residue with the standard error of the original 
correlation coefficient and cease factorizing when the 
residues all sank below twice these standard errors. But 
f € use of the formula for the standard error of r is now 
ny upon because of the skewness of the distri- 

ution, 5 
cients, 


Moreover, sampling errors in the correlation coeffi 2 
3 an 
3 


“ing themselves correlated, produce further factors 
aa S Ove-mentioned test tended to stop the analysis too 
on (Wilson and Worcester, 1939). These further factors 


m . 
ust be taken out in order to give elbow room for rotation 


of the axes to some psychologically significant position. 


= the error factors are not concentrated in the last 
ors taken out, but have been entangled with all. 


Sually more factors have to be taken out than can be 


*Pected, on rotation, to yield meaningful psychological 


i . 
ee but all the dimensions are required nevertheless for 
Jons oons. In geometrical terms, some of the dimen- 
Il be due to sampling 


mmon factor space wi due 
Ive te not the particular dimensions indicated by Le 
Bene ns of the last factors to be extracted. In iene a 
an ling’s plan, the whole ellipsoid is distorted z E a 

nee i entirely to sam > 
ot necessarily due Sek Bate 


arg 
Jaborious 


€ 


and vy ones free fromit. A methodis 
orcester (1939, 139) which is, howeve!; 
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when the number of tests is large. See also Burt (1940, 
338-40). Lawley (1940, 76 et seg.) repeated Wilson and 
Worcester’s criticism and developed an accurate criterion 
described in the next chapter. This is probably the best 
plan to use in any research where great accuracy is necessary. 
And it is for the case where communalities are employed. 
It is, however, only legitimate when the factor loadings 
have been found by Lawley’s application of the method of 
maximum likelihood. 

Principal components lend themselves to exact treat- 
ment when full unities are used, i.e. there are no specifics 
assumed. Hotelling himself (1933, 437-41) discusses the 
matter of the number which are significant. Davis (1945) 
shows how to find the reliability of each principal compon- 
ent from the reliabilities of the tests, and finds that it may 
happen that a later component is more reliable than an 
earlier one. 

5. M. S. Bartlett’s test of significance for principal com- 
ponents.—Recently (Bartlett, 1950) a method has been 
described for deciding the significance of principal com- 
ponent factors which, while it is unlikely, in its present 
form at least, to be usable in any ordinary cases, ought 
to be briefly described here. It is highly desirable that 
exact methods, or methods where the assumptions made 
and the approximations permitted are clearly realized and 
set out, should gradually replace those based on experience 
only. Bartlett’s method depends upon the latent roots of 
the matrix of correlation coefficients with unity in each 
diagonal cell—it is not applicable to communalities. 

Latent roots have been mentioned on page 111, where 
they appear as the sums of squares of the loadings of the 
tests in each principal component. In the example there 
used, their values are— i 


%4 = 2-198 
dg = -824 
s = -678 
%4 = -800 


They are equal in number to the tests, and their sum also 
is exactly 4. Bartlett forms quantities R; as follows : 
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R =x i = | 
1 4 X F =l | log,R. 
Ry = RA ( 2 i} GE 
3^ PEES, = +8506 — 016182 
R, Sve > ( 3 yi | 
gh, ———_— e 7S! — 0-25 
} 3 aT A P 7734 696 
a = AyAgAgAg = -3684 | — 0-99858 


Baa ead we require the natural logarithms, which are 

are give as the usual logarithms to the base ten. They 

eae ficient ove. These logarithms, multiplied by a certain 

facto nt, are an approximation to x? for the successive 
ts. The coefficient is— 


__  2p+5 | ok 
ie err 


TTNA i 
mAh, ” is the number of persons tested less one, p 1S the 
factors. of latent roots, i.e. of tests, and k is the number of 
l F already dealt with, i.e. it takes in turn the values 
g 3 9a 
In me 
Petsong example p = 4. If we assume that the number of 
S tested was 20, so that Bartlett’s n = 19, we can 


Make this table : 


<= 


5 per cent. 


df. i 
~ . ‘ | x level 
Uas aa eee E 
3 e +1) — 16-833 x (— 99858) = 16:8095 | 12-59 
: 2+1) — 16167 x (— -25696) = 41542 | 7-82 
=! o 1 | —15:500 x (— -16182) = 215082 3-84 
e Oe ae e Toke T 
por eens in the last column are to be obtained from 
(a.f. gs entered with the number of degrees of freedom 

i own. Only the first factor is significant (16-8095 


e 
ute renter than 12-59). 

fen ao assumed 29 children (n = 28) we should have 
fee then 2 ed by a peculiar result. The thr x 
first 5:80, 6-47, and 38-96, so that it looks as though e 
actor and the third factor are significant, with the 


ee values of x? 
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factor in between not significant!* But Bartlett warns 
(1950, 78) that this y? test is only valid if the roots 
already removed are significant. As soon as we come 
to a non-significant factor, the later factors are also non- 
significant. The last factor of all is not dealt with. 
“Merely the correlation structure of the variables is being 
investigated in its relation to variance,” says Bartlett 
(page 80). “ For this reason no significance can ever be 
attached to the last root, for it would be equivalent to 
asking for the correlation structure of a single variable.” t 


* Compare the report by Davis (1945) that a later component may 
be more reliable than an earlier one. 

tł In a later paper (B.J.P. Statist. 4, p. 1) Bartlett warns that 
after one or more significant components have been eliminated it is 
safer to take as the number of degrees of freedom 

4 (p—k—1) (p—k+2) 
instead of 
+ (p—k) (p—k—1) 

as used above. This would increase the degrees of freedom in the 
second line of the analysis on page 125 from 3 to 5, and in the 
third line from 1 to 2, and raise the 5 per cent. level. 


CHAPTER IX 


THE MAXIMUM LIKELIHOOD METHOD OF 
ESTIMATING FACTOR LOADINGS * 


(by D. N. Lawley) 


pence of statistical estimation —In recent times attempts 
ti e been made to introduce into factorial analysis statis- 
e methods developed in other fields of research. In 
Particular the method of statistical estimation put forward 
ky i sher (1921, page 323 et seq.), and termed the method of 
“ee likelihood, has been applied by Lawley (1940, 
‘1, 1943) to the problem of estimating factor loadings. 
0 M method has the property of using the largest amount 
$ Breiable information contained in the data and gives 
Peet ” estimates, where such exist, of all unknown 
e Meters, i.e. estimates which, roughly speaking, are on 
Othe nee nearer the true values than those obtained by 
t, “inefficient,” methods of estimation. 
mes using the maximum likelihood method for A 
initial” actor loadings it is necessary to make certai 
and the smPtions. We assume that both the test scores 
normall, factors, of which they are linear funciona AR 
viduals distributed throughout the population é in ty 
een th w% be tested. This assumption of norma E 
aPpear € subject of some criticism, but in practice 1t wo $ 
À gece departure from strict normality of distributio 


hypomve” serious. It is also necessary tO mee AS 
ich S Concerni of genera 
corning ae We shall later 


ich are % A 
rOn Present in addition to specifics. e 
Bae” how this hypothesis may be tested, and hot i 
sufficien determined whether the number assumed is, in 1aCt, 
to account for the data. Rs dire 
numerical example—In order to illustrate the £ 


sti- 


* 
o i dure of 
Lawley detailed exposition of the arithmetical proce 


method, with checks, see Emmett (1949). 
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lations needed we shall reproduce an example used by 
Lawley (1943b), where eight tests were given to 443 indi- 
viduals. The table below gives the correlations between 
the eight tests, unities having been placed in the diagonal 
cells. In this example the hypothesis made is that two 
general factors, together with specifics, are sufficient to 
account for the observed correlations. 


| 1 2 3 4 5 6 7 8 


S| 


1 | 1-000 +312 “405 “457 -500 -350 +521 -564 
2 "312 1-000 -460 316 279 "173 339 -288 
3 405 -+460 1-000 +394 -380 258 433 323 
4 457 316 -394 1-000 -460 ‘222 "516 -486 
5 -500 +279 *380 -460 1-000 +239 44l ALT 
+350 173 -258 -222 239 1-000 B02 +262 
+521 "339 433 516 4d +302 1:000 SAT 
564 288 323 486 417 262 -547 1-000 


DID 


The method of estimation about to be described is one 
of successive approximations. Each successive step in the 
calculations gives a set of factor loadings which are nearer 
to the final values than those of the previous set. To 
start the process it is only necessary to guess or to find by 
some means (e.g. by a centroid analysis) first approxima- 
tions to the factor loadings. Any set of figures within 
reason will serve the purpose, though, of course, the better 
the approximation the fewer steps in the calculation will 
be needed. For illustration we shall take as first approxi- 
mations to the factor loadings the set of values given below : 


Tests 
Trial — x 
loadingin 1 2 3 4 5 6 7 8 
Factor I 73 -50 66 66 “62 -40 73 “70 
Factor II 17 —-27 —-47 08 06 -02 10 29 
Specific 


variance +4382 :6771 +3435 -5580 -6120 +8896 -4571 -4259 

Under the loadings are written the corresponding first 
approximations to the specific variances (the total variance 
of each test being taken to be unity). They are as usual 


found by subtracting from unity the sums of squares of 
the loadings for each test. 
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The calculations necessary for obtaining second approxi- 
mations to the loadings in factor I may now be set out as 
follows : ’ 


(a) 1-666 -738 1-921 1-188 1:013 -476 1-597 1-644 
(b) 5:647 3:895 5-182 5-129 4-830 3-100 5:647 5-412 
(c) 4917 3395 LATZ 4-469 4210 2-700 LOIT A712 
T24 1/h, = 014789 

1 661 623 +399 727 697 


(dq) 727 502 66 


‘The first row of figures, row (a), is found by dividing the 
trial loadings in factor I by the corresponding specific 
variances. The figures in row (b) are then given by the 
inner products (see footnote, page 74) of row (a) with the 
successive rows (or columns) of the correlation table 
printed above, and row (c) is obtained by subtracting 
from the figures in row (b) the corresponding loadings in 
factor I. The quantity hj is given by the inner product 
of rows (a) and (c), and hence, taking the square root of the 
reciprocal of this quantity, we find 1/h,. Finally, row (d) 
is obtained by multiplying the figures in row (c) by 1/hy, 
or 14789. The resulting numbers are then second 
approximations to the loadings of the tests in factor I. 

The most direct way of obtaining second approximations 
to the loadings in factor II is to find the residual matrix 
which results from removing the effect of factor I, and to 
treat it in the same way as the original matrix, using this 
time the trial loadings in factor II. A less direct but con- 
siderably shorter method may, however, be obtained by using 
once more the original matrix and modifying the process 
slightly. The necessary calculations are as shown below : 


(e) +388 —-399 —1-368 -143 -098 024 -219 -681 
(f) 330 —-560 —-980 +150 +113 +038 -190 -580 


Pı = —-0234 
g) ATT —278 —'495 -085 -068 -027 -107 -306 
; :? = 1-1080 1/k, = -9500 


(h) 168 —-264 —470 +081 -065 -026 -102 -291 


Row (e) is found by dividing the trial loadings in favtor II 
by the corresponding specific variances (thus, -388 is 
‘17/-4382), while the numbers in row (f) are given by the 
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inner products of row (e) with the rows of the correlation 
table. : l 
The step by which row (g) is obtained from row (f) is 

a little more complicated than the corresponding step h 
the calculations for the first-factor loadings. From each 
number in row (f) we subtract not only the corresponding 
trial loading in factor II, but also a correction which 
eliminates the effect of factor I; this correction consists 
of the corresponding number in row (d) multiplied by 
— :0234, the inner product of rows (e) and (d). Thus, for 
example, the number -177 in row (g) is equal to 


+830 — -170 — -727 x (— 0234) 


In general, where more than two factors are assumed to be 
present and where further approximations are being calcu- 
lated for the loadings in the rth factor, there will be (r. — 1) 
such corrections to be subtracted, one for each of the 
preceding factors. aime) 

Having found row (g) the quantity i? is now given by the 
inner product of rows (e) and (g), from which, taking the 
square root of the reciprocal, we derive Vk. Row (h) 
is then obtained by multiplying the figures in row (g) by 
1/ky, or 9500. We have thus found second approximations 
to the loadings in factor II. 

The whole cycle of calculations may now be repeated 
over and over again until the required degree of accuracy 
is reached. In practice, provided that the initial trial 
loadings are not too far out, one repetition of the process 
will usually be found sufficient. In our example the final 
estimates ( 


with possible slight errors in the last decimal 
place) were as follows : 


Tests 
= ` 
Loading in 1 2 3 4 5 6 Re 8 
Factor I "725 -503 -664 "661 -623 -399 T26 694 
Factor Il 172 —-261 —468 -087 -069 027 106 -291 
Specific 
variance "445 


'6T9 -340 -556 -607 -840 "462 -484 
Having obtained these figures, there is, 
objection to rotatin 


ting the factors as desire 
reach a psychologically acceptable position. 


of course, no 
d in order to 
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3. Testing significance.—A difficulty in most systems of 
factorial analysis is to know how many factors it is worth- 
while to “ take out,” and to decide how many of them may 
be considered significant. From a statistical point of 
view objections can be raised against the majority of 
methods at present in use for this purpose. When, how- 
ever, the number of individuals tested is fairly large, the 
maximum likelihood method provides a satisfactory means 
of testing whether the factors fitted ean be considered 
sufficient to account for the data. 

To illustrate this let us return to the example of the 
previous section. It is first of all necessary to calculate 
the matrix of residuals obtained when the effect of both 
factors is removed from the original correlation matrix. 
For this purpose we use the final estimates of the loadings 
as already given. The residual matrix, with the specific 
variances inserted in the diagonal cells, is as follows : 


| 1 2 3 4 5 6 7 8 


(+445) —-008 "004 —-037 036 056 —-024 “O11 
— 008 (-679) -004 "006 —-016 —-021 “001 015 
004 "004 (340) —-004 —-001 006 “001 —-002 
—:037 006 —-004 (-556) 042 —-044 027 002 
036 —-016 —-001 042 (607) —-011 —-019 —-085 
‘056 —-021 006 —-044 —-011 (-840) -009 —-023 
— 024 “001 “001 027 —-019 “009 (462) -012 
“O11 015 —-002 002 —-035 —-023 "012 (484) 


rtaukonwre 


We are now able to calculate a criterion, which we shall 
denote by w, for deciding whether the hypothesis that only 
two general factors are present should be accepted or 
rejected. Each of the above residuals is squared and 
divided by the product of the numbers in the corresponding 
diagonal cells. Thus, for example, the residual for 
Tests 4 and 7 is squared and divided by the product of 
the fourth and seventh diagonal elements, giving the result 

(-027)2 
556 x 46g — 002888 
There are altogether 28 such terms, one for each residual, 
and w is obtained by forming the sum of these terms and 
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multiplying it by 443, the number in the sample. The 
result is found to be 20-1. ; | l 

When the number in the sample is fairly large w is 
distributed approximately as y? with degrees of freedom 
given by 


Hn — m)? — n — m} 


where n is the number of tests and m is the assumed num- 
ber of factors. To test whether the above value of w is 
significant we now use a y? table such as is given by 
Fisher and Yates (1938, page 27). In our case, putting 
n = 8 and m = 2, the number of degrees of freedom is 13. 
Entering the y? table with 13 degrees of freedom, we find 
that the 1 per cent. significance level is 27-7. This means 
that if our hypothesis that only two general factors are 
present is correct, then the chance of getting a value of w 
greater than 27-7 is only 1 in 100. If, therefore, we had 
obtained a value of w greater than 27-7 we should have 
been justified in rejecting the above hypothesis and in 
assuming the existence of more than two general factors. 
In our case, however, the value of w is only 20-1, well below 
the 1 per cent. significance level. We have thus no 
grounds for rejection, and although we cannot state that 
only two general factors are present, we have no reason to 
assume the existence of more than two. 

It must be emphasized that the method described above 
is not applicable if other, inefficient, estimates of the 
loadings are substituted for the maximum likelihood 
estimates. For the value of x’ would in that case be 
greatly exaggerated, causing us to over-estimate its 
significance. For this reason we cannot, for example, 
use the method for testing the significance of- the re- 


siduals left when factors have been fitted by the centroid 
method. 


4. The standard errors of 
has now* been developed 
of individual residuals. 
of the.residuals are ver 
In such a case one or mi 


individual residuals —A method 
for finding the standard errors 
This should be useful when a few 
y large, while the rest are small. 
ore of the residuals may be highly 


* Lawley in the Proc, Roy. Soc., Edin., 1949. 
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significant, when tested individually, even though the 
value of y? does not attain significance. The method 
ignores errors of estimation of the specific variances, which 
are not, however, likely to be very large provided that the 
number of tests in the battery is not too small. À 

Let us denote by l, m; the estimated loadings of the i” 
test in the first and second factors respectively (assuming 
the existence of only two factors). Let v; be the specific 
variance of the i” test, and let us write— 


yl? 

h= T =, 

Je 

Ui 
eate 
v; 


Then the standard error of the residual for the i” and Fe 
tests (i 4 j) is given by— 


l m? 

where aihe ee 
r RR k 

ll mm, 

and ey ee eg 
h k 


This formula may, of course, be easily extended to take 
into account any number of factors. 
Let us illustrate the use of the above formula with the 
same numerical example as before. If we wish to test the 
significance of the residual for the first and fourth tests 
after removing two factors, we have— 
Le 3725 any — le vy = -44479 
"661 mM, = -087 V4, = -55551 
h = 6-7185 k = 1-0528 

Hence ¢; = -83845 e = ‘48329 ey = — -08554 


UTA 
and SS ayy are ) = 0196 
l 11744 14 


Thus the residual in question has a value of -037 with a 
standard error of -020. It is clearly not significant. 
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5. The standard errors of factor loadings—When maxi- 
mum likelihood estimation has been used, we are able to 
find the standard errors of not only the residuals but also 
the estimated factor loadings. Using the same notation as 
in the preceding section, the sampling variance of l, the 
loading of the i test in the first factor is (assuming the test 
to be standardized)— 


OSTERT 


and the standard error is the square root of this. 


The covariance between any two first factor loadings J; 
and J; is given by— 


1. 1 1 
wl +3) fu- (1+ ish 


The formule for the variances and covariances of the 
subsequent factor loadings are more complex. Thus the 


variance of m;, the loading of the i” test in the second 
factor, is— 


1 1 1 pt 1 2 
aC ar >) fı = ¢ JF 2) l; (1 + i) 


while the covariance between m; and m; is 


1 1 1 BY 
xh + t) frs — ( + x) il, — 4 € + r)en} 


The results for the general case, w 
factors have been. assumed present, 
without difficulty. Each factor 
term within the curly br 
It should be noted that 
alone, is multiplied by 4. 

The variances and covaria 
are those for given values 
factors. 

It must be stressed that all the above results are applic- 
able only to the unrotated loadings. 

In our numerical example, we find— 


here more than two 
may be written down 
will give rise to one more 
ackets than the preceding factor. 
the last of such terms, and that 


nces of loadings in any factor 
of the loadings in all preceding 
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1 
1+ = 1-14884 


1 
1+ AF 1:9498 


Hence the variance of l, for example, is 
1-14884 
443 
while that of m is— 
1:9498 
443 


Thus the loading of test 1 in the first factor is -725, witha 
standard error of— 


fı — $ X 114884 x 125} = -001810 


{1 —1-14884 x -725*—4 X 1-9498 x a72} = :001617 


V-001810 = -043 
and its loading in the second factor is -172, with a standard 
error of— 

V-001617 = -040 

6. Advantages and disadvantages—To sum up: the 
chief advantage of the maximum likelihood method of 
estimating factor loadings is that it does lead to efficient 
estimates and does provide a means of deciding how many 
factors may be considered necessary. It unfortunately 
takes, however, much longer to perform than a centroid 
analysis, particularly when the battery of tests is a large 
one and when several factors are to be fitted. The chief 
labour of the process lies in the calculation of the various 
inner products ; although in this respect it does not differ 
greatly from Hotelling’s method of finding “ principal 
components.” The maximum likelihood method is thus 
likely to be most useful in cases where accurate estimation 
is desirable and where it is proposed to make a test of 
significance. 

The method also possesses the advantage of being 
independent of the units in which the test scores are 
measured. The same system of factors is therefore 
obtained whether the correlation or the covariance matrix 
is analysed. The loadings in the one case are directly 
proportional to those in the other, 
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PART III 
THE ROTATION OF FACTORS 


CHAPTER X 
THE ROTATION OF FACTORS 


1. Is rotation necessary ?—The factors or axes arrived at 
by the centroid process (or as principal components) are 
not at all the same sort of things as the Spearman system 
and its extensions gave. The Spearman factors, though 
mathematical devices are used in calculating their loadings, 
have psychological meaning from the first. Their names 
indicate this—general intelligence, the verbal factor, ete. 
There is no need for rotating them. 

With the other kind of factor, the case is different. As 
first obtained, they make no claim to have psychological 
meaning. Their virtue is a purely mathematical virtue— 
they each explain, in turn, as much as possible of the vari- 
ance of the tests, and arrive with as few common factors 
as possible at negligible residues. The loadings of the 
first centroid* factor are usually all positive, and it runs as 
a positive factor through all the tests. But it is not as a 
rule identical with Spearman’s g. The succeeding cen- 
troid factors have each negative loadings in about half the 
tests, and are often referred to as bipolar factors. They 
may be looked upon as repeatedly classifying the tests into 
subgroups, and this classification may be expressed by a 
kind of family tree : 


Factor I All loadings positive 
l 


r 1 
Factor II Positive loadings Negative loadings 
| | 


pear es Tile 
Positive Negative Positive Positive 
loadings loadings loadings loadings 


Factor III 


Not infrequently the sub-families into which this bipolar 

classification analyses the tests will have something psycho- 

*This is the most convenient name, to avoid verbosity. But 

unless it is otherwise stated, may it be understood that principal 
components are equally referred to. 
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logical in common, and to that extent these factors in such 
cases may claim to have psychological meaning. Much 
depends on how the battery of tests is made up. And 
such bipolar classification is more natural in tests of tem- 
perament and character, where common speech has many 
bipolar phrases (as brave-cowardly, modest-cheeky, etc.), 
than in tests of an intellectual nature, though there too 
bipolar pairs of words are found, like clever-stupid. 

Many psychologists, however, especially if they tend to 
look upon factors as real mental entities, even perhaps with 
physiological causes, find it difficult to admit all those 
negative loadings. A mental ability or factor, they argue, 
is on the whole something which helps us to do things, not 
hinders. A few negative loadings they can understand : 
but not so many as half the loadings. So they wish to 
turn the centroid axes into positions where most of the 
loadings will be positive, and moreover positions to which 
they can give psychological meaning, and which will be 
found and be recognizable in different batteries of tests. 
For this purpose the factor-analyst must be instructed in 
methods of rotating the centroid factors into new positions. 

2. Methods of rotation—One method, Alexander’s, has 
already been described earlier in this book on pages 79 
to 80. It was used by Alexander himself with excellent 
effect (Alexander, 1935), but involves assuming (a) that the 
communality of a certain test is entirely due to one factor ; 
(b) that the communality of a second test is entirely due 
to this factor and one other ; (c) and so on for r — 1 tests, 
where 7 is the number of factors. The criterion of success 
with this method is to see whether, when these assumptions 
are made, negative loadings disappear; and whether the 
consequent loadings of those tests about which no assump- 
tions are made are compatible with the psychologist’s 
psychological analysis of them. Alexander’s assumptions, 
however, cannot generally be made in a usual battery of 
tests, and other methods of rotation are required. The 
simplest plan is to rotate the factors two at a time in their 
own plane. An example will best explain this. 

3. Two-by-two rotation —Let us suppose that we have ~ 
the following set of loadings in eight tests for three factors : 
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| -38 


1a Nc ae | 
EN Ne ee ariel zt 
3 | 7 —2 —8 | 62 
4 9 —1 8 | 91 
5 5 2 —2 | -33 
6 8 —-4 1 | 81 
7 Qa pF || ath 
8 5 — 3 4 -50 


Suppose further that we want to rotate to positions of the 
three axes where there will be no negative loadings, or at 
least only few, and those small. We shall do this taking 
the axes two by two, and rotating each pair in its own plane. 
Take first axes I, and Ij, 
where the subscripts indicate To 
that no rotation has yet 
taken place. Draw a dia- vy 
gram, using the loadings on 
I, and II, as co-ordinate axes ee OH 
(Figure 21). We can see at Ks 
once that if we rotate the 5. e, AST 
axes to new positions I, and Si F 
I, they will enclose all the Fp A 
test points in their positive / pi a 
quadrant, and all the load- 
ings on these two axes will 1 og 
be positive. The position is, `I 
however, not unique, for we Figure 21. 
could have rotated a little 
farther, or a little less, than 0 and still enclosed all the 
points. I have taken 0 as 37°, with sine 0 = -6 and cosine 
org: 

Consider now the point 5. Its co-ordinates on the former 
axes were -5 and -2, and clearly its new co-ordinates are— 


-5 cos 0 — -2 sin 0 = -28 
and ‘5 sin 0 + -2 cos 0 = -46 
These can be checked approximately on the diagram, and 
this should always be done, at least by eye if not by 
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measurement. 


The new loadings of each of the tests can 


be calculated in the same way, giving— 


I, I, | Sum of squares 
1 08 -56 82 
2 B38 -66 58 
3 ‘68 -26 53 
4 8 -46 82 
5 28 46 "29 
6 ‘88 16 80 
m 18 76 ‘61 
8 58 -06 BA 


At this point two checks should be made: (1) The sum 
of the squares of the loadings of any test in these two factors 


should not have altered. Thus -082 + -56* is the s 
«4? + -4° for the. first test. 


two tests 


and 


AX T+ 4 x 8B 


ame as 


(2) The inner product of any 
pair of rows should not have altered. Thus, for the first 


I 


40 


08 X -38 + -56 x -66 = -4000 


It is sufficient to check only adjacent rows. 


m, 
1, 
4 
a ’ 
eas. Be 4 
ine oe 
x | UA 
a + 
ma `N 
, Se 
p7 ne? z 
‘2 
` 
$ Zi 
` 
xg 
Figure 22, 


Our three axes are now 
L, IL, and II, and I, 
still has negative loadings. 
We must therefore rotate it 
with one of the others, 
which will have its loadings 
further changed. Let us 
choose I, and IIT,, and with 
their loadings make this 
diagram (Figure 22), 

A little trial with a square 
corner of a piece of paper 
shows us that we cannot 
rotate the axes to a position 
which will completely en- 
close all the points, though 
we very nearly can. We 


THE ROTATION OF FACTORS 143 


finally decide to make I, go exactly through point 2, whose 
co-ordinates in this diagram are -38 and — -4. The sine 
and cosine of 0 are therefore : 


4 38 
vape i yaspa 
or ‘725 and -689 
(check that -725? + -689? = unity) 
The loadings of the point 5, for example, are then : 


“689 X -28 — -725 x (— -2) = -338 on I, 
and ‘725 X -28 + -689 x (— -2) = -065 on III, 
as can be approximately checked by a look at the diagram. 


In the same way the other loadings on I, and III, can be 
found, giving the complete table : 


ie iW, iia h? 
1 — 017 -560 “127 +8300 
“2 +552 -660 -000 *7403 
3 686 -260 -286 | -6200 
4 ‘820 -460 772 | -9100 
5 338 -460 -065 "8301 
6 584 -160 707 | +8106 
7 -269 -760 —-007 | -6500 
8 "110 "060 "696 "5001 


The sums of squares of each row ought to give the same 
values for h? as did the original table in Ip, II), and II. 
And the inner product of any pair of rows ought to be 
identical also. For example, taking the last pair (it is 
sufficient to check adjacent rows), we have from this table : 


‘269 X -110 + -760 x -060 — -007 x -696 = :0703 
and from the other : 


6 X °5 "5 X 3 2 X -4 = -07 


We have now succeeded in replacing our original analysis, 
which had many negative loadings, by one which has only 
Positive loadings (except for the two loadings which, 
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although negative, are nearly zero), and gives the same 
correlations and communalities. 

4. An orthogonal rotating matriz.—tIf the reader will, in 
imagination, picture in his mind those original axes I), 10h 
and III, as three lines at right angles to cach other (ortho- 
gonal, as we say), he can further imagine them being 
turned bodily, using their common meeting-place as the 
-swivelling-pomt and keeping them orthogonal, into their 
final positions I,, IL, and ITJ. Actually we did it in two 
steps, but imagine it happening as one complex movement. 

Arithmetically, this one movement can be imitated by 
“ post-multiplying the original table of loadings by an 
orthogonal matrix,” a piece of jargon we must hasten to 
explain. And the reader may miss this section out on 
first reading. A matrix, in mathematics, is an oblong or 
square set of numbers, to be used as an operator on other 
quantities. In our case it is to be used to rotate the original 
loadings to new positions. And since we want the axes to 
remain orthogonal, we have to use an orthogonal matrix, 
i.e. one in which the sum of the squares of any column or 
row is unity, and the inner product of any pair of rows or 
of columns is zero, Actually the orthogonal matrix which 
performs the rotation of the above section 3 is : 


| +5512 -6000 “5800 | 
| — -4184 8000 ° —-4350 
— -7250 -0000 -6890 | 


(The reader can check the sum of squares of any column or 
row, and any inner product of a pair.) Before explaining 
how these numbers are arrived at, let us first perform the 
post-multiplication of the table of original loadings (itself an 
oblong matrix) by this rotating matrix— 


4 eae | +5512 -6000 +5800 | —O17 +560 127 
[7-3 —-4| | PA SA B000 t1350] =}. 552.660. 000 
| ‘7 —-2—8) | —-7250 0000. -6890 “686 -260 -286 
a} a : 320 +460 -772 
dif 3 “a 338 -460 -065 

eaa be ‘534-160-707 
W EA] “269 -760 —-007 
Di8 Aj 


a | +110 -060 -696 
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We have to say post-multiplication because in matrix 
algebra the product 4B is not the same as the product BA. 
Matrix multiplication is performed by finding the inner 
product of each row of the first matrix with each column of 
the second matrix. Thus— 
sA X +5512 — -4 X -4134 — -1 X -7250 = — -01740r — -017 
the first item in the product matrix above. Similarly, the 
quantity -707, which appears in the sixth row and third 
column of the product matrix, is the inner product of the 
sixth row of the first matrix and the third column of the 
second— 

*8 X -5800 + -4 X -4350 + -1 X -6890 = -7069 or -707 
The reader can similarly check the other entries in the 
product matrix. 

When we performed the first of our previous two-by-two 
rotations we were in effect post-multiplying the loadings by 
the rotating matrix— 


| 8 6 0” 
| 36 8 o | 
EEO 0 1 


which will leave the column III, unchanged because of the 
nature of the third column of this rotating matrix. The 
inner product of 0, 0, and 1 with any row of the centroid 
loadings will give a column of loadings identical with III). 

When we performed the second two-by-two rotation, of 
I, and IDy, we were in effect multiplying by the matrix— 


-000 1 -000 
— “125 0 -689 


which clearly does not alter the middle axis. And the 
rotating matrix which would have done these two opera- 
tions simultaneously is— 


8 6 0 “689 0 -725 | -5512 -6000 -5800 
=6 8 0] X 000 1 -000)} = E -8000 —:4850 


oO 2 725 0 -689 —:7250 -0000 “6890 
else ee = ae! 
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5. Reyburn and Taylor’s method.—These South African 
psychologists have proposed to let psychological insight 
alone guide the rotations to which axes are subjected. 
They do not necessarily insist on a g (see their 19414, pages 
258, 254, 258, ete.). Their plan is to choose a group of 
tests-which their psychological knowledge, and a study of 
all that is previously known, leads them to consider to be 
clustered round a factor. They therefore cause one of their 
axes to pass through the centroid of this cluster, keeping all 
axes orthogonal. This factor axis they do not subse- 
quently move. They then formulate a hypothesis about 
a second factor and select a second group of tests, through 
whose centroid (retaining orthogonality) they pass their 
second factor axis. And so on. There is some affinity 
between this and Alexander’s method of rotation (see 
page 79). 

The arithmetical details of their method are as follows. 
They first obtain a table of centroid loadings in the usual 
way. Then, having chosen a group of tests which they 
think form, psychologically, a cluster, they add together 
the rows of the centroid table which refer to those tests, 
thus obtaining numbers proportional to the loadings of 
their centroid. These, after being normalized, form the 
first column of their rotating matrix. For example, 
consider this (imaginary and invented) table of loadings : 


Loadings 

LTE TT h 

1 “hi B FI -26 
2 5 —-3 —6 70 
aa Sa E 
4 5 2 “1 -30 
5 4 4 —2 -36 
6 5 +4 2 45 
Te a 3) VI) aio 
ral ot ee es 
es e =a 3B e|" seo 
10 G 4) -4 68 
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Reyburn and Taylor now decide, let us suppose, that Tests 
9 and 10 are, in their psychological view, very strongly 
impregnated with a verbal factor, and determine to rotate 
their original factors until one of them passes through the 
centroid of these two tests. They extract their rows, add 
them together, and normalize the three totals thus : 


(0) 7 -2 3 
(10) -6 SH A 
1:3 —6 7 Sum of squares 2-54 = 1:5942 


816 —-376 -439 obtained by dividing by 1-594 


If the columns of the original table are multiplied by these 
three numbers and the rows added, the result is the first 
column of the rotated factor loadings in the table below. 
To get the other two columns we must complete the rotating. 
matrix in such a manner that the axes remain orthogonal. 
How this is done will be explained separately later. 
Meanwhile, consider the matrix— 


"816 B99 417 
— 376 —-183 909 
| 439  —-898 
Its first column is composed of the above numbers. It is 
orthogonal, for the sum of the squares of any row or column 
is unity, and the inner product of any two is zero. When 
the original table of loadings is post-multiplied by this we 
get the rotated table : 


Rotated Loadings h? 
-258 -015 -440 -260 
-257 "793 —-064 -699 
-471 564 —-022 -540 
-377 -073 -390 -300 
-088 -266 5380 | +859 
-646 2093 —-155 -450 
-289 +253 -390 300 
-465 -116 -656 -660 
-778 -047 -110 -620 
816 —-047 —-113 -681 


SCOMDRA NE & rw =e 


= 
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At this point the usual two checks must be made, of h? and 
of the inner products of consecutive rows. 

The first factor now goes through the centroid of Tests 
9 and 10, and we scan the loadings it has in the other 
tests to see if these are consistent with their psychological 
nature. For instance, Test 5 has practically no loading on 
this verbal factor—is this consistent with our psychological 
opinion of this test ? 

If this scrutiny is satisfactory, the psychologist using 
this method then proceeds to consider where he will place 
his second factor ; for the second and third columns of the 
above loadings have still no necessary psychological mean- 
ing as they stand. Exactly the same procedure is carried 
out with them, the first column being left unaltered. 
Suppose the psychologist decided on Tests 5, 7, 8 as being 
a cluster round (say) a numerical factor. He adds their 
rows— 


(5) -266 -530 
(7) 253 -390 
(8) 116 -656 


-635 1:576. 
B74 -928 when normalized 


and uses their normalized totals as the first column of a 
matrix to rotate these last two columns. The matrix 
must be orthogonal, and it is in fact— 


| -374 -928 

| -928 —-374 | 
When the second and third columns are rotated by post- 
multiplication by this, the final result is given opposite. 
(The same checks must now be repeated.) The psycho- 
logist now scans column two to see if the loadings of his 
numerical factor agree reasonably with his idea of each 
test, and is-rather sorry to sce two negative loadings, but 
consoles himself by thinking that they are small. He 
must finally try to name his third factor, present to an 
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Final Rotated Loadings 


1 | -258 - -414 —-151 
2 | 257 -237 -760 
3 471 -191 -532 
4 | -377 -889 —-078 
5 | -088 -591 049 
6 | -646 —-109 144 
7 | 289 457 -089 
8 | -465 -652 —-138 
9 778 +120 -002 
10 | -816 —-122 —-001 


appreciable extent only in tests 2 and 3. If he thinks he 
recognizes it, he is content. 

6. Special orthogonal matrices—To carry out the 
above process the reader needs to have at his disposal 
orthogonal matrices of various sizes, such that he can give 
the first column any desired values. The following will 
serve his purpose. Except for the first one, they are not 
unique, and alternatives can be made. 


U =u | 


" 
Order 2 í v | 
rder | A En 


Order 3 mq mp L| aae 
—lq —p m’ ae T } 
Pp —q | Davie di 


Tt was from this formula that the matrix used in the last 
section, with first column of -816, —-876, -439, was made. 
For if we set 


p = 439 

we have q = 898 
and from mq = -816 
we have m= -909 
and thence l= -417 


x 
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Order 4 a b —c —d 
b —a d —e 


d —c —b a 


This one was used by Reyburn and Taylor in their 1939 
article (page 159). 

Similar matrices of higher order can be made by a 
recipe given by them, viz. multiplying together two or 
more of the above, suitably extended by ones and zeros. 
For example, a matrix orthogonal and with arbitrary first 
column, of order 5, can be made by multiplying together : 


up —Ap x | | mg E 


>. pu -lt -—p ae ip arh . 
3 2 4 «e |X] z m ` . 
1 f ae | ; ; ji a 
1 . 4 1 


ah less 
where? + m? =p?+@='¢@+w=7°?'+o2=1. 

7. Principles deciding where to stop rotation.—We have 
mentioned two principles, (a) the desire to rotate to posi- 
tions where there will be few, if any, negative loadings 
—hbut usually this is insufficient to define a final position 
uniquely, and (b) Reyburn and Taylor’s plan of following 
their psychological intuition in placing the axes. They 
too accept the need for mainly positive loadings, and they 
keep their axes at right angles. We turn now, in our next 
chapter, to a principle (Simple Structure) which is accepted 
widely in America, though hardly at all in Great Britain. 


CHAPTER XI 
ORTHOGONAL SIMPLE STRUCTURE 


1. Agreement of mathematics and psychology.—It is clear 
that the whole process of multifactor analysis is one by 
which a definition of the primary factors is arrived at by 
satisfying simultaneously certain mathematical principles 
and certain psychological intuitions. When these two 
sides of the process click into agreement, the worker has a 
sense of having made a definite step forward. The two 
support one another. Obviously the goal to be hoped for 
along this line of advance will be the discovery of some 
mathematical process which always leads to a unique set of 
factors mainly acceptable to the psychologist. If such 
could be discovered and found to produce a few factors 
over and above those recognized as already known by other 
means, the new factors would stand a good chance of 
acceptance on the strength of their mathematical descent 
only. And no doubt the psychologist would be prepared 
to make a few concessions and changes in his previous ideas 
to fit in with any mathematical scheme which already gave 
much satisfaction and was objective and unique in its 
results. 

It is here that Thurstone’s notion of “ simple structure ” 
is offered as a solution (Vectors, Chapters 6-8). This idea is 
that the axes are to be rotated until as many as possible of 
them are at right angles to as many as possible of the 
original test vectors ; and that the battery is not suitable 
for defining factors unless such a rotation is uniquely 
possible, a rotation which will leave every axis at right 
angles to at least as many tests as there are factors, and 
every test at right angles to at least one axis. 

When the vectors of a test and a factor are at right 
angles, the loading of the factor in that test is zero. 
Thurstone’s “ simple structure ” is therefore indicated by 
a large number of zeros in the matrix of loadings, so large 
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that there will be only one position of the axes (if any) 
which satisfies the requirement. His search, be it repeated, 
is for a set of conditions which will make the solution 
unique. We have seen him approaching this goal by 
stages. Unless the battery is large, so that— 


os (2r + 1) er SEAN) 


(see Chapter V, Section 9), the communalities are not 
unique. Even when the battery is large ehough, the axes 
representing factors may be rotated to positions among 
which there is no one specially marked out. Then comes 
the demand that there be this large number of zero loadings. 
Most batteries of tests will not allow this demand to be 
satisfied, but with some it can just be attained. Only 
these last, it is Thurstone’s conviction, are suitable for 
defining primary factors, and it is his faith that the factors 
thus mathematically defined will be found to be acceptable 
as psychologically separable unitary traits. 

2. An example of six tests of rank 3—To make our 
remarks more definite and concrete, let us suppose that 
we have a battery of six tests whose matrix of correlations 
can be reduced to rank 3. In practice, of course, six tests 
are far too few, and more than three factors quite likely. 
The matrix of loadings given by the “ centroid” system 
contains at first negative quantities. Thus from the 
correlations : 


| 1 2 3 4 5 6 
1 | : 525 -000 -000 -448 -000 
2 | -525 5 098 +306 -349 -000 
3 | 000 098 í Josi keiai PEIA 
4 | 000 -306 -133 ; 000 -000 
5 | 448 349. -314 -000 307 
6 | 000 -000 -504 -000 -307 


with the communalities— 
674 634 “558 “415 +490 493 


we get by the “ centroid ” process the matrix of loadings : 
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-542 612 074 
-629 -842 —-348 
-529 —-492 191 
— +550 
628 143 "274 
"429 —-424 -859 


Oe ONH 
Y 
a 
= 
ay 
o 
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It is the factor axes indicated by these loadings that 
Thurstone wishes to rotate until there are no negative 
loadings and enough zero loadings to make the position 
uniquely defined. For this last purpose he finds, empiri- 
cally, that it is necessary to require— 

(a) At least one zero loading in each row ; 

(b) At least as many zero loadings in each column as 
there are columns (here three); and 

(c) At least as many XO or OX entries in each pair 
of columns as there are columns. By an XO entry is 
meant a loading in the one column opposite a zero in the 
other. 

“ At least one zero loading in each row.” This means 
that no test may contain all the common factors. In 
making up the battery, then, the experimenter, with some 
idea in his mind as to what the factors are, will endeavour 
to ensure that they are not all present in any one test. 
This would, for example, exclude from a Thurstone battery 
(except as an extra) any very mixed group test, or a mixed 
test like the Binet-Simon which is itself a whole battery 
of varied items. 

“At least as many zeros in each column as there are 
columns,” that is, as there are common factors. This 
means that in a Thurstone battery no factor may be general, 
but must be missing in several tests. 

The requirement a8 to the number of XO or OX entries 
is intended to ensure that the tests are qualitatively 
distinct from one another. 

Now, these requirements cannot generally be met by a 
Matrix of loadings. It will in general be impossible to 
rotate the axes (keeping them orthogonal) until every 
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axis is at right angles to r test vectors. The above artificial 
example has, however, been constructed so that this can 
be done. 

The correlations were in fact made from the loadings : 


A B C 
TE ipl pao : -821 
2 -475 -639 
3 | 718 -206 ? 
Me ee -644 : 
5 | -488 -546 
6 | -702 


and the centroid loadings must therefore be capable of 
being rotated rigidly into this form, retaining ortho- 
gonality. 

8. Two-by-two rotation to simple structure.—The problem 
for the experimenter, however, is to discover this “ simple 
structure,” if it exists ; he is not, like us, in the position of 
knowing that it does exist, and what it is. Thurstone’s 
original method was to use two-by-two rotations, in each 
i rotation endeavouring to obtain some 
zero loadings. Let us illustrate by our 
artificial example, taking first the centroid 
factors I and II. Using their centroid 
loadings as co-ordinates, we obtain Figure 
23. At once we notice that the test 
points 3, 4, and 6 are almost collinear 
on a radius from the origin, and that 
if we rotate the axes clockwise through 
about 42° the new position of I, labelled 
I, in the diagram, will almost pass 

Figure 23. through these test points, while the new 
axis II, will almost pass through test 

point 1. On these new axes, therefore, Tests 3, 4, and 6 
will have hardly any projections on axis IL; that is, will 
have hardly any loadings in a factor along II. From 
tables we find sin 42° = -669, and cos 42° =-748. We 
have then : ; 
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Old loadings New loadings 
I IT T TI, 
1 -542 612 —-007 ‘S17 
2 -629 "842 -289 -675 
3 "589 —-492 ‘722 —-012 
4 281 —-182 | 331 053 
5 628 143 | 371 -526 
6 "429 —-424 | 602 —-028 
multipliers “743 —-669 for I, loadings, 


| 669 "743 for IT, loadings. 


We have now obtained our desired three zero (or near zero) 
loadings in factor IĻ. Accepting the approximations to 
zero as good enough for the present, 
we next make Figure 24 from the m 
loadings of I, and III in the same 
way as we made the former figure. 
In this, Test 1 falls quite near the 
origin. Tests 5 and 6 are approxi- 
mately on one radius, and Tests 2 
and 4 on another, and these radii 
are at right angles to one another. 
If we rotate the axes I, and III 
rigidly through a clockwise turn 
of about 49° they will pass almost 
through these radial groups and 
Nearly zero projections will result.* Using sin 49° = -755 
and cos 49° = -656 we perform a similar calculation to the 
preceding, using the loadings I, and III as starting-point and 
obtaining loadings on I, and III, (the subscript indicating 
the number of rotations that axis has undergone). We 
have finally, putting our results together, the table of 
loadings overleaf FA.+ 


* The rotation might with advantage have been carried a little 
further, 

f The matrix symbols, using Thurstone’s notation, are given for 
the convenience of mathematical readers. _ Others should ignore 
them, When the tests are many and the centroids few, a saving can 
be effected by picking tests equal in number to the factors and per- 


Figure 24, 
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1 | —-060 -817 -043 
2 | +420 6751 ~—-048 
3 -329 —-012 -670 
4 | -632 053 —-lll 
5 -037 -526 -460 
6 124 —-028 690 


Clearly, this is an approximation to the loadings of the 
factors A, B, and C which we who are in the secret (as a 
real experimenter is not) know to have been used in making 
the correlations : III, here is A, I, here is B, and II, is C. 
The small loadings are not quite zero, and the other load- 
ings not quite the same, but a further set of rotations 
would refine the results and bring them nearer to the 
A BC values. 

4, New rotational method.—When this two-by-two rota- 
tional method is used on a large battery of tests, with 
perhaps six or seven factors instead of three, it is not 
only laborious but somewhat difficult to follow. Thur- 
stone has, however, devised a method of rotation which 
takes the factors three at a time, and to this we now turn, 
still using our small artificial example as illustration. In 
this example, since there are only three factors, this new 
method leads to a complete solution at once. With more 
factors the matter would be more complicated. 

If the reader will think of the three centroid factors as 
represented by imaginary lines in the room in which he is 
sitting (Figure 25), he will be aided in following the 
explanation of this new method. Imagine the first 
centroid axis to be vertically in the middle of the room, 
and the other two centroid axes on the carpet, at right 
angles to the first and to each other. The test points are 
in various positions in the room space, if we take their three 
centroid loadings as co-ordinates and treat the distance from 
floor to ceiling as unity. Imagine each test point joined 


forming two-by-two rotations on their loadings F,. Let the result- 


s 


ing loadings be V,. Then R= F1 V, can be used as a rotating 


matrix on the whole table F of centroid loadings. The tests chosen 
to form F, should represent different clusters. 
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by a line to the origin (in the middle of the carpet, where 
the axes cross). The lengths of these lines are the square 
roots of the communalities, and the loadings on the first 
centroid factor are their projections on to the vertical axis, 
the height, that is, of each test point above the floor. 


Figure 25 (not to scale), 


Thurstone now imagines each of these lines or com- 
munality vectors produced until it hits the ceiling, making 
a pattern of dots on the ceiling. These extended vectors 
now all have unit projection on the first centroid axis, 
for we agreed to call the distance from floor to ceiling 
unity. Their y and z co-ordinates on the ceiling will be 
Correspondingly larger than their loadings on the second 
and third centroid factors, and can be obtained by dividing 
each row of the centroid loadings by the first loading. In 
Cur case this gives us the following table, obtained in the 
manner just mentioned from the table on page 153. 


Extended centroid projections 


[ae IL, II, 
1 | 1000 1-129 -187 
Cia ce 544 —-558 
8°} E 980) =861 
4 » «648 — 1-957 
5 X 228 -436 
6 A Ss l L837 
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The second and third columns are now the co-ordinates 
of those dots on the ceiling of which we spoke. A diagram 
of the ceiling, seen from above, is given in Figure 26 and 
the important point about 
it is that the dots form a 
triangle. 

If the reader will now 
picture this triangle as 
drawn on the ceiling of 
Ue his room, and remember 

that the origin, where the 

centroid axes crossed, is in 

the middle of the carpet, 

he can next imagine an 

inverted three-cornered 

pyramid, with the triangle 

on the ceiling as its base, 

Figure 26, the origin in the middle of 

the carpet as its apex and 

the communality vectors 1, 4, and 6 as its edges. The 

vector 5 lies on one of the faces of this pyramid ; vector 

2 lies on another ; vector 3 lies on the remaining face, all 
springing from the origin and going up to the ceiling. 

5. Finding the new axes—If now we choose for new 
axes (in place of the centroid axes) three lines at right 
angles respectively to the three plane faces of our pyramid, 
the test projections on these axes will clearly have the 
zeros we desire. The three vectors 1, 2, and 4 all lie in 
one face, and will have zero projections on the axis A’ 
at right angles to that face. The vectors 1, 5, and 6 will 
have zero projections on the line B’ at right angles to their 
face. The vectors 3, 4, and 6 will have zero projections on 
C’ at right angles to their face. The reader should 
visualize these new axes in his room. It remains to be 
shown how the other, non-zero, projections are to be 
calculated, and to inquire whether these new axes are 
orthogonal, and whether they can be identified with the 
original A, B, and C. The first step is to obtain the equa- 
tions of the three sides of the triangle in the diagram. 
Where there are many tests and the dots are not perfectly 
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collinear, one plan is to draw a line through them by eye, 
and measure the distances a and b it cuts off on the axes, 
then using the equation— 


YB 
cherie 


Or we can write down the equations of the lines joining 
points at the corners, either actual test points, or the places 
where our lines intersect, using the equation— 
(lv — mu) + (m — v) y + (u — l) z = 
when 7, m are the co-ordinates of one corner, and u, v of 
another. We obtain in our case— 
— 2:121 + 2094y — 1-777z = 0 for line 1, 2,4 
— 1-080 + -700y + 21172 =0 ,, „ 1,5,6 
2476 + 2-794y + -3402=0 ,, ,, 4,3, 6 
Where y means the extended II, and z the extended III. 
Before we go further we have to divide each equation 
through by the root of the sum of the squares of its 
Coefficients, so that the new coefficients sum to unity when 
Squared—this is called normalizing and is necessary in 
order to keep the communalities right and for other reasons. 
The equations then are : 
— -611 + -603y — -512z =0 (1) 
— +486 + -283y + 854z =0 (2) 
‘660 ++ -745y + -091z =0 (3) 
and it is clear, from the way in which they have been 
reached, that these equations will be satisfied by the ex- 
tended co-ordinates of certain of the rows in the table on 
Page 158. Consider the first equation and write its co- 


—611 -603 —-512 | Weighted 
x y z sum 
1 | 542 612 074 -000 
2 | 629 -342 —-348 -000 
a | .529 —-492 -191 | —-718 
4 | -281 —-182 —-550 -000 
5 | -628 +148 274 | —-438 
6 | -429 —-424 -359 | —-701 
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efficients above the columns of that table, placing — -611 
over the first column, as shown at the foot of page 159. 
If we multiply each column by the multiplier above it 
and add the rows we get the quantities shown on the right 
for comparison with page 154. The zeros are in the right 
places for factor A. The other loadings are, however, 
negative, but that can be easily put right by changing all 
the signs of the multipliers, which we are at liberty to do. 
Similarly, using eqns. (2) and (3) we get the loadings of 
factors B and C exactly, except for an occasional difference 
due to rounding off at the third decimal place. We have, 
indeed, found the matrix product FA, 


612 ea | -611 -436 -660 | _| ; 5 Apa 

-842 —-348 | —:603 —-283 -745 | | "475 +039 
—492 191| | -512 —-854 -091 | “718 -206 
—-182 —-550 | |_ > Oddie E 

143 ‘274 438 . -546 
+424 -359 | “702 


| = 


except, as has been already said, for occasional dis- 
crepancies in the third decimal place. The procedure we 
have described has enabled us to discover this last matrix, 
with which, in fact, we began. And by analogy (is the 
deduction sound?) an experimenter with experimental 
data who follows this procedure and reaches simple 
structure concludes that that is how his correlations were 
made. Certainly that is how they may have been made. 
The matrix A beginning with -611 is the rotating 
matrix which turns the axes I, II, III into the new posi- 
tions A, B, C. Its columns are the direction-cosines of 
A, B, and C with reference to the orthogonal system 
I, II, IN. Are A, B, and C orthogonal? The cosines of 
the angles between them can by a well-known rule be 
found by premultiplying the rotating matrix by its 
transpose. When we do so we find A'A =I, viz.: . 


| 611 —603 -512| ‘611 -436 -660 1 


| ` 
4136 —'283 —:854 | | 603° — +282 45 2 |" 4 = Is 
| so -745 w | -512 —-854 -091 so Aal 
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(again allowing: for third decimal pl 
That is to say, the angles between A, 
cosines, they are right angles. 

The axes A, B, and C were drawn 
three planes which form the py 
therefore these three planes ar 
another. (Our rough sketch in F igure 25 
mid too acute.) It follows that 4, B 
the edges of the pyramid. 
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ace discrepancies). 
B, and C have zero 


at right angles to the 
ramid mentioned above, and 
e also at right angles to one 


> and C are actually 
In our example (though this 


and C through Test 1, 
the factors, for e 
the common-factor sp 
we have called a te i 
projected on t ' space. The complete 
test vectors ar bi i nore dimensions, of 
which the n-factor space is a 
subspace, ` 


l preliminary rotation 
centroid factor. 
f the common-factor 


are more 
are not so 
Space is, for example, 
a of extended vectors, in 
addition to its fir: ies, will have three other 
oo * iona eiling of our room, in our 
o; me three-dimensional, a 

p ight angles to the first centroid axis. On 
Paper its dimensions can only be graphed two at a time, 
no complete triangle will be visible among the dots, 
ae of dots will be seen to be collinear, lines can be 
n through them, and a Procedure similar to that out- 


ee Pi followed, This will become clearer when we 
our-dimensional example. First, however, it is 
three-dimensional 

© a device which facilitates the work on higher 
otation. It is 
e and we are 
more than three 
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to a position where each of them is equally inclined to the 
original first centroid axis. In our imagined room the first 
centroid axis ran vertically from the middle of the floor 
to the middle of the ceiling, while the other two were 
drawn on the floor itself. Imagine all three (retaining 
their orthogonality) to be moved, on the origin as pivot, 
until they are equally inclined to the vertical so that they 
enclose: the inverted pyramid of Figure 25. That is a 
Landahl rotation. The lines through the test points have 
not moved. They remain where they were, and still hit 
the ceiling in the same pattern of dots. The projections of 
the extended vectors on to the original first centroid 
axis all still remain unity. But for the next step in this 
method we need their projections on to the Landahl axes. 
We obtain these by post-multiplying the matrix of cen- 
troid extended loadings by a Landahl matrix, an orthogonal 


i p ANE . 1 
matrix with each element in its first row equal to ya 

C 
where c is the order of the matrix ; that is, its number of 
rows or columns (Landahl, 1938). We need a Landahl 
matrix of order 83, for example : 


| 577 -577 BT 
816 —-408 —-408 
000 -707 —-707 


The element -577 is the cosine of the angle which each axis 
makes, after rotation, with the original position of the first 
centroid axis. 

When the table of extended vector projections on page 
157 is post-multiplied by the above matrix, the table on 
page 163 results, giving the projections of the extended 
vectors on to the Landahl axes L, M, N. 

From this table three diagrams LM, LN, and MN can 
be made, and the reader is advised to draw them. Each 
of them shows a triangular distribution of dots and in this 
simple three-dimensional example only one of them is 
needed. But in a multi-dimensional problem several are 
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Projections on Landahl axes* 


| L M N 
1 1-498 "213 020 
2 1-021 —-066 746 
3 —:182 1-212 701 
4 “048 — 542 2-225 
5 -760 “792 176 
6 —+229 1-572 388 


needed, and as a rule only one line is used on each diagram 
employed. Here, from the ZN diagram we find the 
equations of the three sides of the triangle to be : 


—2-2051 — 1:450n + 3-382 = 0 
3681 + 1:727n — -586 = 0 
18371 — -277n + -528 =0 


We want to make these homogeneous in J, m, and n, and so 
we add, after each of the numerical terms, the factor 
‘577 (L+ m +n), which equals unity. The equations 
then are: 


—+2821 + 1:9238m + -4738n =0 
+0301 — -338m + 1-:389n = 0 
2:1421 + -305m + -028n = 0 


* After a Landahl adjustment the axes are not infrequently 
already near simple structure, as here. It is sometimes worth while 
to rotate them slowly round the original first centroid, like spinning 
an umbrella, to improve the approximation to zero entries. This 
can be done by an orthogonal matrix whose columns sum to unity, 
as e.g. 


‘9900 —-0946 1046 
| 1046 “9900 — -0946 
= -0946 "1046 -9900 


or its transpose : and the rotation will be the slower, the nearer the 
diagonal elements are to unity. 
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After normalizing, these become : 
—-1411 + -961m + -236n = 0 
-0211 — -236m + -971n =0 
-9901 + -141m + -018n = 0 
Writing the coefficients as columns in a matrix, and 


premultiplying by Landahl’s matrix (since at an earlier 
stage we post-multiplied by it) we obtain : 


| -609 -436 -660 
—:603 — +283 "T45 
| -513 —:853 “090 


the same matrix A as we arrived at (page 160) without the 
use of Landahl’s rotation. The advantage of using a 
Landahl rotation appears only in problems with more 
than three common factors, The reader can readily make 
a Landahl matrix of any required order, say 5. Fill the 
first row with the root reciprocal of 5, -447. Complete 
the first column by putting in the second place -894 
(because -4472 + +8942 = 1), and below that zeros. The 
second row must then be completed with equal elements, 
all negative, such that the row sums to zero. Then the 
second column is completed in a similar way, and the third 
row, and so on. The reader should finish it. There are 
alternative forms possible, one of which is used below. 
An unfinished Landahl matrix : 


AAT AAT AAT AAT -4AT 
894 —-224 —-294 —.994 —-224 
| 000 -866 —:289 —-289 —-289 
| 000 -000 

|000 -000 
l 


7. A four-dimensional example.—The following example 
of a problem with four common factors is only partly 
worked out, so that the reader can finish it as an exercise. 
It also is an artificial example, and orthogonal simple 
structure can be arrived at. The centroid analysis gave 
four centroid factors with the loadings shown in this table : 
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Centroid loadings F 


I IT III IV 

1 727 "517 -094 -126 
2 -575 "105 -5583 049 
3 “810 -289 "246 —-246 
4 -588 -417 —-367 —-382 
5 -524 —-588 —-450 183 
6 "549 — +485 398 —-013 
7 624 —-318 —-187 —-254 
8 594 —-+551 239 084 
9 -626 "252 —-169 -562 
10 645 307 —-3857 —-109 


After these have been “ extended ” (i.e. divided in each 
row by the first loading) they were post-multiplied by a 
Landahl matrix, one of the alternative forms, viz. : 


| 


or Or Gr Gr 
Pes 
St Or or 
ae 


and the resulting projections on the Landahl axes were 
thus found to be : 
L M N P 


1-007, “704 -122 “166 


ai 
2 1115 068 “848 —:030 
3 -679 678 -625 018 
4 ‘218 1:492 158 1382 
5 —311 “199 453 1-660 
(a W 455 —'247 1:270 -522 
% —:107 -598 -808 ‘701 
8 308 —-285 1:094 833 
So] 1-015 387 —-285 883 
10 ‘376 1:099 ‘070 454 


Six diagrams can be made, and it is advisable to draw 
them all, though not all are necessary. The LN diagram 
is shown in Figure 27. We scan it for collinear points 
(not necessarily radial) which have all or nearly all the other 
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points on one side of their line, and note the line 5, 4, 10, 9. 
Its equation is readily found to be approximately : 
“7381 + 1-327n — -371 = 0. 

We make this homogeneous by substituting for unity, after 
the numerical term -371, the quantity -5 (l + m + n + p), 
for -5 is the cosine of the angle each of the Landahl axes 
makes with the original first centroid axis. This gives us 
the equation (not yet normalized) : 

-558l — -185m + 1-141n — -185p = 0. 
Three more equations are needed, and one of them can 
indeed be obtained from the same diagram, on which 
points 5, 7, 8, 6 are very nearly collinear. The reader is 
advised to draw the remaining diagrams and complete the 
calculations following the steps of our previous example. 
The above equation refers to a line which makes a fairly 
big angle with N. It is desirable to look for the remaining 
three lines making large angles (approaching right angles) 
with L, M, and P. 

It will be remembered that in our earlier example the 
sign of one equation had to be changed at the end of the 
calculation because large negative values were appearing 
in the final matrix of loadings. This can be obviated 
by attending to the following rule. If the other test-points 
are on the same side of the line as the origin the numerical 
term must be positive in the 
equation ; if they are on the 
side remote from the origin 
the numerical term must be 
negative. In the adjacent 
diagram, the origin and the 
other points are on opposite 
sides of the line through 
5, 4, 10, 9 and therefore 
the numerical term must be 

Figuro 27. negative, as it is (—-871). 
Had it been positive all the 
signs of the equation would have required to be changed. 

8. Ledermann’s method of reaching simple structure.— 
Ledermann has pointed out that when simple structure 
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can be attained (whether orthogonal or oblique) then as 
many r-rowed principal minors of the reduced correlation 
matrix must vanish as there are common factors; and 
that it follows that the same number of vanishing deter- 
minants must be discoverable in the table of centroid 
loadings. Thus, for example, in the table of centroid 
loadings on page 153 the three determinants composed 
respectively of rows 1, 2, and 4; of rows 1, 5, and 6; and 
of rows 3, 4, and 6 all vanish, and these rows are where the 
zeros come in the three columns of the simple structure. 
This gives an alternative method of reaching simple 
structure. Test every possible r-rowed determinant in 
the centroid table of r factors. If 7 of them are discovered 
to vanish, then simple structure may be and probably is 
possible. Each of these vanishing determinants will 
provide a column of the rotating matrix A, for which pur- 
pose we delete any one of its rows and calculate all the r—1 
rowed minors from what is left. The column has then to 
be normalized. This process works equally well for 
oblique simple structure (see next Chapter). Its draw- 
back, when the number of factors is large, is the necessity 
of calculating so many determinants to discover those that 
vanish. 

9. Limits to the extent of factors.*—Orthogonal simple 
structure requires that no factor shall extend through many 
tests, and it is possible to decide beforehand, from the 
correlations, whether factors running through not more 
than s tests cach are adequate to give the measured correla- 
tions, leaving n — s zeros. They will not as a rule be able 
to do so if the average correlation exceeds (s — 1)/(m — 1): 
more exactly, not if the largest latent root of the matrix 
is larger than s. If these rules are to be applied when 
communalities are used, as is the case when testing whether 
orthogonal simple structure is possible, the matrix should 
first be “ corrected for communality,” i.e. each r must be 
divided by the square root of the product of the two com- 
munalities concerned. Approximations to the largest 
latent root of a matrix of correlations, when the entries are 
all positive, are— 

* A brief summary of a chapter with this title in previous editions. 
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sum of the whole matrix 
n 


or more accurately— 


sum of the squares of the column totals 
sum of the whole matrix 


An exact test for the possibility of orthogonal simple 
structure has been given (Ledermann, 1936) and is des- 
cribed in the Appendix, page 367, but it requires a pro- 
hibitive amount of calculation. 

Even, however, when orthogonal simple structure cannot 
be attained with orthogonal factors, it may be possible to 
reach it with oblique factors. 

10. Leading to oblique factors.—In this chapter we have 
kept our factors orthogonal; that is, independent, un- 
correlated with one another. It is natural to desire them 
to be different qualities, and convenient statistically. In 
describing a man, or an occupation, it would seem to be 
both confusing and uneconomical to use factors which, 
as it were, overlapped. Yet in situations where more 
familiar entities are dealt with, we do not hesitate to use 
correlated measures in describing a man. For instance, 
we give a man’s height and weight, although these are 
correlated qualities. 

Often, moreover, a battery of tests which will not 
permit simple structure to be reached if orthogonal 
factors are insisted on will nevertheless do so if the factors 
are allowed to sag away a little from strict orthogonality. 
Even as early as in Vectors of Mind, Thurstone expressly 
permitted this. It can clearly be defended on the ground 
that even if the factors were uncorrelated in the whole 
population, they might well be correlated to some extent in 
the sample of people actually tested. I was at one time 
under the impression that this comparatively slight de- 
parture from orthogonality was all that was contemplated 
by Thurstone. But he and his fellow-workers now have 
the courage of their convictions, and permit factors to 
depart from orthogonality as much as is necessary to attain 
simple structure, even if they are then found to be quite 
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highly correlated. A chapter on these oblique factors* is 
therefore necessary, and out of them arise Thurstone’s 
“ second order factors.” 

11. Parallel proportional profiles —A method which, like 
Thurstone’s simple structure, is meant to enable us to 
arrive at factors which are real entities, or to check 
whether our hypotheses about the factor composition of 
tests are correct, has been put forward by R. B. Cattell 
(1944b, 1946), and has interesting possibilities which its 
author will no doubt develop. The essence of his idea 
is that “if a factor is one which corresponds to a true 
functional unity, it will be increased or decreased ‘as a 
whole’,”” and therefore if the same tests are given under 
two different sets of circumstance, which favour a certain 
factor more in one case and less in the other, the loadings 
of the tests in that factor should all change in the same pro- 
portion. Experimental trials of this principle may be ex- 
pected soon from its author. Among “ different circum- 
stances ” he mentions different samples of subjects, differ- 
ing, say, in age or sex, and different methods of scoring, or 
different associated tests in the battery. But he prefers 
another kind of change of circumstance ; namely a change 
“from measures of static, inter-individual differences to 
measures from other sources of differences in the same 
variables.” He instances, among his examples, inter- 
correlating changes in scores of individuals with time, or 
intercorrelating differences of scores in twins. We may 
thus have two, or several, centroid analyses, and the mathe- 
matical problem is to find rotations which will leave the 
profile of loadings of a certain factor similar in all the factor 
matrices. It may even be that the profiles of several fac- 
tors could be made similar. These factors would then 
Satisfy Cattell’s requirement as corresponding to “true 
functional unities.” The necessary modes of calculation 
to perform these rotations have not yet been more than 
adumbrated, however. 


* It must be clearly understood that this obliquity or correlation 
of factors is quite a different matter from the correlation of estimates, 
€ven of orthogonal factors, due to the excess of factors over tests 
described on pages 287 to 242. 

F.A,—6§* 


CHAPTER XII 
OBLIQUE FACTORS 


1. Pattern and structure—So long as the factors are 
orthogonal, the loadings in the matrix of loadings are also 
the correlations between the factor and the tests, but this 
ceases to be the case when the factors are correlated. The 
word “ loading ” continues to be used for the coefficients 
such as l, m, and n in equations like— 


z =la + mp + ny 


and the matrix or table of these is called a pattern, while 
the matrix of correlations between tests and factors is 
called a structure. The entries in a structure are pro- 
jections from a point on to certain axes. The entries in a 
pattern are the oblique co-ordinates of that point along 
those axes. The two are only identical if the axes are 
orthogonal. 

Moreover, as soon as the factors become oblique, it 
becomes necessary to distinguish between “ reference 
vectors ” and “ primary factors.” The reference vectors 
are the positions to which thé centroid axes have been 
rotated so that the test-projections on to them include a 
number of zeros. Each reference vector is at right angles 
to a hyperplane containing a number of communality 
vectors. A hyperplane is a space of one dimension less 
than the common-factor space. In our first example in 
Chapter XI the hyperplanes were ordinary planes, the 
faces of the three-cornered pyramid there referred to (see 
page 157) and each reference vector was at right angles to 
one of those faces. 

The primary factor corresponding to a given reference 
vector is the line of intersection of all the other hyper- 
planes, excluding, that is, the hyperplane at right angles to 
the reference vector. In our three-dimensional common- 
factor space the primary factor was the edge of the pyra- 
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mid where those two faces met, excluding that face to 
which the reference vector was orthogonal. 

Now, when the reference vectors turn out to be at right 
angles to each other, as they did in that example, each 
reference vector is identical with its own primary factor. 
But not when the reference vectors turn out to be oblique. 
In Chapter XI we did not distinguish them, and called their 
common line the “ factor.” But in this chapter the dis- 
tinction must be kept clearly in mind. It is the primary 
factors Thurstone wants. The reference vectors are only 
a means to an end. 

Thurstone’s second method of rotation described in 
Chapter XI, the method in which the communality 
vectors are “ extended,” and lines drawn on the diagrams 
which are not necessarily radial lines, will not keep the 
axes orthogonal, but seeks for the axes on which a number 
of projections are zero, regardless of whether the resulting 
directions are orthogonal or oblique. In general they will 
be oblique, and the examples worked in Chapter XI only 
gave orthogonal simple structure because they had been 
devised so as to do so. The test of orthogonality is that 
the matrix of rotation, premultiplied by its transpose, 
gives the unit matrix (see page 160). Or in other words, 
that the inner products of the columns of the rotating 
matrix are all zero. They are the cosines of the angles 
between the reference vectors, and the cosine of 90° is 
zero. 

2. Three oblique factors—To illustrate Thurstone’s 
method when the resulting factors are oblique we shall 
next work an example devised to give three oblique 
Common factors. Consider this matrix of correlations : 


1 2 3 4 5 6 7 
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which, with guessed communalities, gives these centroid 


loadings : 


F 
ae Me Tis THT 
1 | -449 —-682 "165 
2 825 —-478 —-129 
3 -906 "336 -020 
4 -846 133 457 
5 -808 208 —-412 
6 697 +336 -835 
7 “767 173° —-468 


When these projections on the centroid axes are “ ex- 
tended,” that is, when each row is divided by the first 
loading in that row, we obtain this table : 


| 


1 
2 
l 
4 
5 | 
6 
7 


I 


e 


” 


iit, = Jute 
| 1:000 —1:519 -367 
| —:579 —-156 
-371 -022 
-157 -540 
257 —-510 
482-481 
226 —-610 


The columns II, and III, in this table represent the co- 


Figure 28, 


ordinates of the “ dots 
on the ceiling” in our 
analogy of Chapter XI, 
p- 157. When we make 
a diagram of them we 
obtain Figure 28. We see 
that a triangular forma- 
tion is present, and we 
draw the dotted lines 
shown. 

It is not essential, it 
may be remarked in pass- 
ing, that there be no 
points elsewhere than on 
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the lines, provided they are additional to those required to 
fix the simple structure. Had it not been for the desirability 
of keeping the example small we would have increased the 
number of tests, and not only arranged for further points 
to fall on these lines, but also included some whose dots 
fell inside the triangle, representing tests which involve all 
three factors. 

We find the equations of these lines to be approximately 


‘475 + -50y + -95z =0 (line 1, 2, 7) 
1113 + -183y — 2-119z = 0 (line 1, 4, 6) 
"403 — 1-091y + -256z = 0 (line 7, 5, 3, 6) 


The coefficients of each equation have to be “ norma- 
lized,” that is, reduced proportionately so that the sum of 
their squares is unity (for they are to be direction cosines). 
These normalized coefficients are then written as columns 
in a matrix as follows : 


“4.05 "464 -888 
426 ‘076 —-916 | =A 
‘809 —-883 215 


The table of centroid loadings on page 172 must now be 
post-multiplied by this rotating matrix to obtain the 
projections of the tests on the three reference vectors which 
are at right angles to the planes defined by the dotted lines 
in our diagram. We obtain this table : 


aN 
(Simple) Structure on the Reference Vectors 
L' Bi Di 


025 ‘O11 812 
026 -460 -689 
-526 4.28 003 
‘769° —-001 -262 
083 “755 —-006 
-696 -053 -000 
‘006 “782 -000 


WAT wwe 
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We have labelled the columns L’, B’, and D’ for a reason 
which will become apparent later, when we explain how 
the correlations were, in fact, made. This table is a simple 
structure, formed by the projections on the reference 
vectors. It has a zero (or near-zero) in each row, and 
three or more in each column, in the positions to be 
anticipated from Figure 28; for example, tests 8, 5, 6, 
and 7, which are collinear in the figure, have zeros in 
column D’. 

Now let us test the angles between the reference vectors. 
To do this we premultiply the rotating matrix by its 
transpose 


KA C 
= Se = pi 
405 -426 809! | 405 464-338 1 — +494 —-079 
464 -076 —-883 |:426 -076 —-916 = —-494 1 —1038 


*888 —-916 -215 | “809 —-883 215 —079 —-108 1 


This gives the cosines of the angles between the reference 
vectors and we see that they are obtuse. The angles are 
approximately : 


120° 95° | 
120° fe 96° | 
95° 96° . 


As soon as we know that the reference vectors are not 
orthogonal, we have to take account of the fact that the 
primary factors are not identical with them. Each prim- 
ary factor is the line in which the hyperplanes intersect, 
excluding that hyperplane to which the corresponding 
reference vector is orthogonal. In a three-dimensional 
common-factor space like ours the primary factors lie 
along the edges of the pyramid which the extended vectors 
form. : 

Let us return to our mental picture, which the reader 
ean place in the room in which he is sitting. The origin, 
immediately below the point O in Figure 28, is in the middle 
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of the carpet. Figure 28 itself is on the ceiling, seen from 
above as though translucent. The radial lines with 
arrowheads are the projections of the primary factors on 
to the ceiling. The projections of the reference vectors 
are not drawn, to avoid confusion in the figure. They 
are near, but not identical with, the primary factors. 

The reader should not be misled by the fact that two of 
the primary factors lie along the same lines as Tests 1 
and 7. It was necessary to allow this in devising an ex- 
ample with very few tests in it (to avoid much calculation 
and printing large tables). But with a large number of 
tests the lines of the triangle could have been defined 
without any test being actually at a corner. 

8. Primary factors and reference vectors.—At about this 
stage a disturbing thought may have occurred to the 
reader. We have sought for, and obtained, simple 
structure on the reference vectors. That is to say, we 
have found three vectors, three imaginary tests, which are 
uncorrelated each with a group of the actual tests, namely 
where there are zeros in the table on page 173. The entries 
in that table are the projections of the actual tests on the 
reference vectors. 

But the primary factors are different from the reference 
vectors. The projections of the tests on to the primary 
factors will be different and will not show these zeros. 
Those projections are, in fact, given in this table (never 
mind for the moment how it is arrived at): 


F(A’) >D 
Structure on the Primary Factors 


L B D 


L GO 162 832 
2 -408 -666 -793 
3 -866 “+809 “176 
4 "934 4.95 “401 
5 "541 :927 -152 

"842 “472 132 


-468 -915 “150 


IO 
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These numbers are the correlations between the primary 
factors and the tests, and none is zero. The primary 
factor structure is not “ simple,” it is the reference vector 
structure that is simple. Why then not use the reference 
vectors as our factors ? 

A two-fold answer can be given to this, one general, the 
other particular to this example. The latter will become 
clear when we divulge how the example was made. The 
former requires us to return to the distinction between 
structure and pattern. A structure is a table of coxrelations, 
a pattern is a table of coefficients in a “ specification ” 
equation specifying how a test score is made up by factors. 
The entries in a pattern are loadings or saturations of the 
tests with the factors, but not correlations. 

Pattern and structure are only identical when the 
reference vectors are orthogonal and coincide with the 
primary factors. When the reference vectors are oblique 
(usually at obtuse angles) the primary factors are different 
and are themselves usually at acute angles. When the 
primary factors and reference vectors thus separate, the 
structure of the reference vectors and the pattern of the primary 
factors are identical except for a coefficient multiplying 
each column ; and vice versa the structure of the primary 
factors is identical (except for similar coefficients) with the 
pattern of the reference vectors. In particular, where 
there are zeros in the reference vector structure there will 
also be zeros in the primary factor pattern. The general 
theorem of the reciprocity of reference vectors and primary 
factors (to use our present terms), that is, the reciprocity 
of (a) a set of lines orthogonal to hyperplanes, and (b) 
another set of lines which are the intersections in each case 
of the remaining hyperplanes, is an instance of the reci- 
procity which runs through the whole of n-dimensional 
geometry between hyperplanes of k dimensions and of 
(n — k) dimensions. It occurs in several other places in the 
geometry of factorial analysis : for instance, tests, persons, 
and factors are all in one sense reciprocal and exchange- 
able. 

The particular fact about the zeros in the primary factor 
pattern can be seen readily from the geometrical analogy. 
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For a test vector which lies in a hyperplane can be com- 
pletely defined as a weighted resultant of the primary 
factors which are also in that hyperplane, without any 
assistance from other primary factors. In our drawing 
of the reader’s study, for example, on page 157, the vector 
of the Test 2, which is the line O2, lies upon the plane 
face O14 of the pyramid, and can be completely described 
by a weighted sum of the primary factors along the edges 
O1 and 04, without bringing in the edge O6 at all. The 
primary factor which lies along that edge will therefore 
have a zero weight in the row of the pattern which speci- 
fies Test 2. This pattern on the primary factors will be 
very similar to the structure on the reference vectors 
already given for our example in the table on page 173. 
It can, in fact, be calculated from that table by multiplying 
the first column by 1-163, the second column by 1-166, 
and the third by 1-017, giving the following : 


FAD“ 
(Simple) Pattern on the Primary Factors 
L B D 
1 | 029 018 -826 
2 030 536 ‘701 


| -612 "499 003 
| 895 —-001 -266 
-096 ‘880 —-006 
-809 “062 -000 
*008 *912 -000 


Ian k w 


Thus although the primary factors differ from the 
reference vectors (the angles between the primary factors 
and their corresponding reference vectors are, in fact, 31°, 
31°, and 11°), yet if the structure on the reference vectors 
is “ simple,” the pattern on the primary factors will be 
“ simple.” The entries in the above table can be used as 
Coefficients in specification equations, and if for clearness 
We omit the near-zero coefficients entirely, we have found 
that the test scores can be considered as made up thus : 
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Score in Test 1 = -826d + Specific 
= -536b + -701d + 
-6127 + -499b + A 
=-8951 + -266d + 


Il 


IQ oe WL 


» 99 99:5 = -880b ar 8 MED 
» » » 6 = “8091 Ete ah 3S 
99 9 T = -912b cial ads 


4. Behind the scenes.—It is now time to divulge what 
these “tests” really are and how the “scores” were 
made whose correlations we have been analysing, and to 
compare our analysis with the reality. The example is a 
simpler and shorter variety of a device used by Thurstone 
and published in April 1940 in the Psychological Bulletin. 
The measurements behind the correlations were not made 
on a number of persons, but were made on a number of 
boxes—only eight boxes, to keep down the amount of 
calculation and printing. These boxes were of the follow- 
ing dimensions : 


Length Breadth Depth 

1 2 2 1 

2 3 2 3 

3 3 2 2 

4 6 3 2 

5 4 4 2 

6 5 3 1 

Uf 5 4 8 

8 4 4 2 
Sum| 82 24 16 
Mean 4 3 2 


The “tests? were seven functions of these dimensions, 
and are shown in the next table, which also shows the 
score each box (or “ person ”) would achieve in that test. 
It is as though someone was unable for some reason to 
measure the primary length, breadth, and depth of these 
boxes (as we are unable to measure the primary factors 
of the mind directly) but was able to measure these more 
complex quantities like LB, or 4/(L? + D?) (as we are 
able to measure scores in complex tests) : 
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Boxes = Persons 
Test) Formda j 1 2 8 4 -5 6 7 8 | Sum | Mean 


x] 


| 
al 36 | 4-500 


| 3 Le 9) anega ka" o ea 

2 BD 2 6 4 6 8 8 12 8| 49} 6-125 
8) | np 4 6 6 18 16 15 20 16| 101 |12-625 
A |4/(L?+-D?) 2-24 4-24 3-61 6-32 4-47 5-10 5-83 4-47 36-28 | 4-535 
5 | IFB: | 6 tf % 15) 20 Wa 21 20 | 110 | 13-750 
6) L2+D | 5 12 11 38 18 26 28 18| 156 |19-500 
1 eB 2 a: 12) Fe) W aa We TAE vodalles'000 


With these scores the sums of squares and products of 
deviations from the mean are : 


wE 2 g0 rian Te Gee 
1 | 66 50:5 225 102 25 29 3 
2 | 505 729 984 168 1123 100:5 16 
3 | 22:5 98-4 273-9 47-9 259-2 398-5 36 
4 | 102 168 479 114 370 93 4&7 
5 | 25 1123 2592 B70 2835 288 4l 
6 | 29 1005 398-5 91:3 288 800 36 
ee et ee Wi Ey SG 


From these the correlations could be calculated by dividing 
each row and column by the square root of the diagonal 
cell entry. But that would make no allowance for specific 
factors, which in all actual psychological tests play a 
considerable part. In the example devised by Thurstone 
on which this is modelled there are no specific factors, but 
it was decided to introduce them here into Tests 5, 6, and 7, 
by increasing their sums of squares. In addition, by an 
arithmetical slip, a small group factor was added to these 
three tests, and this was not discovered for some time. It 
was decided to leave it, for in a way it makes the example 
more realistic, and may be taken to represent an experi- 
mental error of some sort running through these three tests. 

With these changes, the correlations are found, and are 
those with which we began this chapter and which we have 
already analysed into three oblique factors Z, B, and D. 
Let us now compare that analysis with the formule which 
we now know to represent the tests. The pattern on 
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page 177, for example, shows that Test 2 depends only on 
factors B and D: and that is correct, for it was, in fact, 
their product BD, and L did not enter into it. The 
analysis gives the test score as a linear function of B and D, 


-536b + -701d 


whereas it was really a product. But the analysis was 
correct in omitting L. Similarly, the analyses into the 
other factors can be compared with the actual formule, 
and in almost every case the factorial analysis, except for 
being linear, is in agreement with the actual facts. Tests 5 
and 6, true, appear in the analysis to omit factors L and D 
respectively, although these dimensions figured in their 
formule. But it would appear that they were swamped 
by reason of the other dimension in the formule being 
squared ; and also possibly the specific and error factors 
we added did something towards obscuring smaller details. 
Also the process of “ guessing ©” communalities, though 
innocuous in a battery of many tests, is a source of con- 
siderable inaccuracy when, as here, the tests are few. 

5. Box dimensions as factors—We can now explain the 
particular reason for selecting the primary factors, and not 
the reference vectors, as our fundamental entities. The 
fundamental entities in the present example can reason- 
ably be said to be the length, breadth, and depth of the 
boxes, given in the table on page 178. Now, the columns 
of that table are correlated with one another, as the reader 
can readily check, the correlation coefficients being— 


L with B, -589 
L,, D,-144 
B: a D, :204 


These correlations are due to the fact that a long box 
naturally tends to be large in all its dimensions. It could, 
of course, be very, very shallow, but usually it is deep and 
broad. 

The reference vectors were, it is true, correlated, but 
negatively. They were at obtuse angles with one another 
(see page 174) and obtuse angles have negative cosines 
corresponding to negative correlations. So the reference 
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vectors do not correspond to the fundamental dimensions 
length, breadth, and depth. 

What, then, are the angles—and hence the correlations— 
between the primary factors? We shall find that they 
are acute angles, and their cosines agree reasonably well 
with the above correlations between the length, breadth, 
and depth. The algebraic method of finding these angles 
is given in the mathematical appendix, but it is perhaps 
desirable to give a less technical account of it here. We 
need the direction-cosines of the primary factors, that is, 
the cosines of the angles they make with the orthogonal ` 
centroid axes. Each primary factor is the intersection 
of n — 1 hyperplanes—in our simple case is the intersection 
of two planes. 

In n-dimensional geometry a linear equation defines a 
hyperplane of n — 1 dimensions. For example, in a plane 
of two dimensions a linear equation is a line (of one dimen- 
sion)—hence the name linear. But in a space of three 
dimensions a “linear ” equation like aw + by + cz = 
is a plane. Two such equations define the line which is 
the intersection of two planes. 

Now, the equations of the three planes which form the 
triangular pyramid of which we have previously spoken 
are just those equations we have already obtained and 
used in our example, viz. : 

-405% + 426y + 8092 = 0 
464 + -076y — -883z = 0 
"838w — -916y + 215z = 


These equations taken two at a time define the three 
edges of the pyramid, which are our primary factors, and 
if we express each pair in the form— 


GH tt 

iW . é R 3 
A the direction cosines are proportional to a, b, and o, 
vich only require normalizing to be the direction cosines. 
hen the direction cosines are found in this way, and 


Wri 2 5 
Yritten in columns to form a matrix, they prove to have 
e values— 
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797 -835 -503 
-400 187 —-843 | EASED) 
453 —-517 -192 


This is the rotating matrix to obtain the projections, i.e. 
the structure, on the primary factors, and if the centroid 
loadings on page 172 are post-multiplied by this there 
results the table we have already quoted on page 175. 

The above matrix, premultiplied by its transpose, gives 
the cosines of the angles between the primary factors. We 
obtain— 


1 506 150 
-506 u 164 |= DCD 
"150 "164 1 


Compare these with the correlations between the columns 
of dimensions of the boxes, viz. : 


1 589 144 
-589 1 "204 
"144 -204 1 


The resemblance is quite good, and shows that it is the 
primary factors, and not the reference vectors, which 
represent those fundamental although correlated dimen- 
sions of length, breadth, and depth in the boxes. 

6. Criticisms of simple structure —Thurstone’s argument 
is then, of course, that as this process of analysis leads to 
fundamental real entities in the case of the boxes (and 
also in his “trapezium” example, Thurstone, 1944a, 
page’ 84, with four oblique factors), it niay be presumed to 
give us fundamental entities when it is applied to mental 
measurements. And I confess that the argument is very 
strong. 

My fears or doubts arise from the possibility that the 
argument cannot legitimately be reversed in this way. 
There is no doubt that if artificial test scores are made up 
with a certain number of common factors, simple structure 
(oblique if necessary) can be reached and the factors 
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identified. But are there other ways in which the test 
Scores could have been made ? Spearman’s argument was 
a similar reversal. If test scores are made with only one 
common factor, then zero tetrad-differences result. But 
zero tetrad-differences can be approached as closely as we 
like by samples of a large number of small factors, with 
very few indeed common to all the tests. 

However, Thurstone’s simple structure is a much more 
complex phenomenon than Spearman’s hierarchical order, 
and yet he seems to have had no great difficulty in finding 
batteries of tests which give simple structure to a reason- 
able approximation. I am not sceptical, merely cautious, 
and admittedly much impressed by Thurstone’s ability 

oth in the mathematical treatment and in the devising 
of experiments. 

Thurstone might, I think, put his case in this way. He 
assembles a battery of tests which to his psychological 
Intuition appear to contain such and such psychological 
factors, some being memory tests, some numerical, etc., 
cte., no test, however, containing (to his mind) all these 
expected factors. He then submits their correlations to 

Is calculations, reaches oblique simple structure, and 
Compares this analysis with his psychological expectation. 
there is agreement, he feels confirmed both in his psy- 
chology and in the efficacy of his method of finding factors 
mathematically, Usually there will not be complete 
agreement, and he is led to modify his psychological ideas 
somewhat, in a certain direction. To test the truth of these 
aye ler ideas he again makes and analyses a battery. 

Specially he looks to see if the same factors turn up in 
Various batteries. He uses his analyses as guides to 
modifications of his psychological hypotheses, or as con- 
mation of them. In Great Britain Thurstone’s hypo- 

esis of simple structure has been, I think it is correct to 
i ay, rather ignored than criticized. Most British psycho- 
“gists have imbibed during their education a belief in and 
: ‘Partiality fop « Spearman’s g,” a factor apparently 
®bolisheq by Thurstone. Since his work on second-order 
“tors rehabilitates ø, this objection may disappear. 


5? ¥ T. 
yburn and Taylor of South Africa have, however, 
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criticized simple structure shrewdly (1943a, and a later 
paper by Reyburn and Raath, 1949) even although they 
themselves do not insist on a g (see 1941a, pages 253, 254, 
258). 

An early form of response to Thurstone’s work was to 
show that his batteries could also be analysed after Spear- 
man’s fashion. Holzinger and Harman (1938), using the 
Bifactor method, reanalysed the data of Thurstone’s 
Primary Mental Abilities and found an important general 
factor due, as they truly say, “ to our hypothesis of its 
existence and the essentially positive correlations through- 
out.” Spearman (1939) in a paper entitled Thurstone’s 
Work Reworked reached much the same analysis, and raised 
certain practical or experimental objections, claiming that 
his g had merely been submerged in a sea of error. But 
there is more in it than that. As I said in my contribution 
to the Reading University Symposium (1939) Thurstone 
could correct all the blemishes pointed out by Spearman ` 
and would still be able to attain simple structure. I said 
on that occasion that however juries in America and in 
Britain might differ at present, the larger jury of the future 
would decide by noting whether Spearman’s or Thurstone’s 
system had proved most useful in the hands of the prac- 
tising psychologist. I now think that they will certainly 
also consider which set of factors has proved most invariant 
and most real. Very likely the two criteria may lead to 
the same verdict. But for the present the two rival claims 
are in the position described by the Scottish legal phrase, 
“taken ad avizandum.” 

7. Application of multiple-factor analysis to industrial test 
data.—Dr. R. Harper, with various co-workers, has applied 
these methods of factor analysis, begun in connexion with 
psychological tests, to tests of a physical kind on various 
substances during their manufacture. In Nature of 
November 20th, 1948, Harper and Baron wrote: “In 
industrial physics there are occasions when empirical tests 
are employed the exact meaning of which is not fully under- 
stood, and where the interrelationships between the tests 
could profitably be studied by similar means ” to those 
used in psychology, and they described a centroid analysis, 
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without rotation, of rheological measurements on cheese. 
In the British Journal of Applied Physics of January, 1950, 
Harper, Kent, and Blair gave an account of the factorial 
analysis of ten tests (seven rheological and three electrical) 
on a group of plastics (polyvinyl-chloride-plasticizer mixes). 
They made a centroid analysis, with four iterations, took 
out three factors, and rotated them orthogonally to 
maximize the number of near-zero loadings. They tried 
also other rotations, including one to an approximate 
oblique simple structure, and suggest interpretations of the 
factors arrived at. 


CHAPTER XIII 
SECOND-ORDER FACTORS 


1. A second-order general factor—The reason why the 
factors arrived at in the “ box” example were correlated 
was that large boxes tend to have all their dimensions 
large. There is a typical shape for a box, often departed 
from, yet seldom to an extreme degree. Therefore the ' 
length, breadth, and depth of a series of boxes are corre- 
lated, and so also are Thurstone’s primary factors in such 
a case. There is a size factor in boxes, a general factor 
which does not appear as a first-order factor (those we 
have been dealing with) in Thurstone’s analysis, but 
causes these primary factors to be correlated. Possibly, 
therefore, when oblique factors appear in the factorial 
analysis of psychological tests, there is a hidden general 
factor causing the obliquity. This factor or factors (for 
there might be more than one) can be arrived at by analys- 
ing the first-order factors, into what Thurstone calls 
second-order factors, factors of the factors. 

Of course, whether such a procedure could be justified 
by the reliability of the original experimental data is very 
doubtful in most psychological experiments. The super- 
structure of theory and calculation raised upon those data 
is already, many would urge, perhaps rather top-heavy, and 
to add a second storey unwise. But we should not, I think, 
let this practical question deter us from examining what is 
undoubtedly a very interesting and illuminating suggestion, 
which may turn out to be the means of reconciling and 
integrating various theories of the structure of the mind. 

If we take the primary factors of our “ box ” example of 
Chapter XII, they were correlated as shown in this matrix : 


| 1 -506 -150 
-506 1 -164 
| -150 -164 1 

| 
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If we analyse these in their turn into a general factor 
and specifics we obtain, using the formula— 


"at 


} 

: Talat 

g saturation = (=) A 
Tre 


the saturations of the primary factors with a second-order 
g as -680, -744, and -220; and each primary factor will 
also have a factor specific. We have now replaced the 
analysis of the original tests into three oblique factors by 
an analysis into four orthogonal factors, one of them 
general to the oblique factors and presumably also general 
to the original tests, though that we have still to inquire 
into. We must also inquire into the relationship of the 
specifics of the original tests to these second-order factors, 
which are no longer in the original three-dimensional 
common-factor space, but in a new space of four dimen- 
sions. Are the original test-specifics orthogonal to this 
new space ? : 

With only three oblique factors, an analysis into one g 
is always possible (except in the Heywood case, which will 
often occur among oblique factors). If there had been 
four or more oblique factors, we would have had to use more 
second-order general factors unless the tetrad-differences 
were zero. Thurstone’s “ trapezium” example already 
referred to had four oblique factors, and his article should 
be consulted by the interested. 

2. Its correlations with the tests —Let us turn now to the 
question what the correlations are between the seven 
original tests and the above second-order g. To obtain 
these Thurstone uses an argument equivalent to the fol- 
lowing : 

We may first note that each reference vector makes an 
acute angle with its own primary factor, but is at right 
angles to every other primary factor, for these are all 
contained in the hyperplane to which it is orthogonal. 
The cosines of the angles can be obtained by premulti- 
plying the rotation matrix of the reference vectors by 
the transpose of the rotation matrix of the primary 
factors. 
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Correlations between Primary Factors and Reference Vectors 


DA x A =D 
97T +400 453 -405 -464 -338 | -860 
835 "187 —-517 -426 -076 —-916 | = -858 . | 
-503 —-843 “192 -809 —-883 “215 2888) 


These cosines in the diagonal of the matrix D give us the 
angles 31°, 31°, and 11° which we have already mentioned 
on page 177 as the angles between each primary factor and 
its own reference vector. 

Each row of the first of the above matrices represents 
the projections of the primary factor on to the orthogonal 
centroid axes. These are, in fact, the loadings of the prim- 
ary factors, thought of as imaginary or possible tests, 
in the orthogonal centroid factors I, II, and III. Following 
Thurstone, we add these three rows below the seven rows of 
our original seven real tests, extending the matrix F in 
length thus : 


I II TIT | Tg 
L 449 —-682 165 211) 
2 825 —-478 —-129 574 
8 -906 336 020 787 
4 -846 +133 “457 -666 wanted 
5 +808 208 —-412 719 
6 -697 336 335 597 
7 ‘767 173 —-468 683 J 
L 797: -400 -453 -680 
B -835 187 —-517 ‘744 known 
D 503 —-843 +192 +220 J 


This lengthened matrix we want to post-multiply by 
a column vector (4 in Thurstone’s notation) to give the 
correlations of the tests, including the imaginary tests 
L, B, and D, with the second-order g. In other words, we 
want to know by what weights each column must be mul- 
tiplied so that the weighted sum of each row is the correla- 
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tion of that test with g. Suppose these weights are u, v, 
and w. Since we already know from our second-order 
analysis what 7, is for each of the primaries L, B, and D, 
we have three equations for u, v, and w, the solution of 
which gives us their values. We have— 
‘79Tu + -400v + -453w = -680 
*835u + 187v — 517w = -744 
+503u — -848v + "192w = -220 
and these equations can be solved in the usual way, if 
the reader wishes. The values are -798, -198, and —-077. 
A closer examination of them, however, which can be 
most readily expressed in matrix notation, leads to an 
easier plan—especially desirable if the number of primary 
factors were greater. In matrix form the above equations 
are— 
-- Tb =r 
whence ọ = T°»; 
and since T is merely a short notation for DA= we have— 
Y = (DAT), 
= AD", 
That is to say, the centroid loadings F of the seven tests 
have to be post-multiplied by this, giving a matrix (a 
single column)— 
Fy = FAD, 
But FA we already know. It is (see page 173) the simple 
structure V on the reference vectors. So we merely have 
to multiply the columns of V by D™7, and add the rows to 
get the correlation of each test with g. These multipliers 
are, that is to say : 
“680 + -860 = -791 
-744 + +858 = -867 
+220 — -983 = -224 
The results are the same as by the former method, except 
for discrepancies due to rounding off decimals, and are 
given to the right of the preceding table. 
8. A g plus an orthogonal simple structure—In his own 
examples, Thurstone has not calculated the loadings of the 
original tests with the other orthogonal second-order 
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factors, the factor specifics. This can, however, clearly be 
done by the same method as above. Since the correlations 
of the general factor with the three oblique factors are 
680, -744, and -220, the correlations of each factor specific 
with its own oblique factor are -733, -668, and -975. For 
example, -733? = 1 — -680%. The second-order analysis 
therefore is : 


| -680 733 : 5 

“TAA > -668 ` =13) 

+220 3 5 -975 

Dividing the rows by the divisors already mentioned, viz. 
860, -858, and -988, we obtain the matrix : 


| -791 +853 wy BSA H 
+867 : ‘779 . | =DE 
| -224 3 4 -992 


and when the matrix V is post-multiplied by this we 
obtain the following analysis of the original seven tests 
into a general factor plus an orthogonal simple structure 
of three factors : 


General Factor plus Simple Structure 


G = VDE 
g a B ò 
i 11 021 “009 805 
2 5TA 022 B58 683 
3 787 449 833 —:006 
4 666 656  —-001 260 
5 ‘719 071 "588 —-006 
6 597 593 041 “000. 
7 683 005 -609 000 


The zero or very small entries in à, 8, and § are in the 
same places as they are for L’, B’, and D’ in the oblique 
simple structure V (see page 173). What we have now 
done is to analyse the box data into four orthogonal 
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factors corresponding to size, and ratios of length, breadth, 
and depth. In terms of our pyramidal geometrical 
analogy we have “ taken out a general factor” by depress- 
ing the ceiling of our room, squashing the pyramid down 
until its three plane sides are at right angles to each other. 

The above structure, being on orthogonal factors, is also 
a pattern, so that the inner products of its rows ought to 
give the correlation coefficients with the same accuracy, if 
we have kept enough decimal places in our calculations, as 
do the rows of the centroid analysis F: and so they do. 
For example, the correlation between Tests 1 and 2 is, 
from F, ` 

449 X +825 + -682 X -478 — -165 X -129 = -675 

and from G it is— 

“211 X -574 + -021 X -022 + :009 x -358 + -805 X -683 =-675 
The “ experimental ” value was +728, the difference of 
053 being due to the inaccuracy of the guessed com- 
munalities, or in an actual experimental set of data to 
sampling error and to the rank of the matrix not being 
exactly three. 

We can sce here a distinct step towards a reconciliation 
between the analyses of the Spearman school and those 
of Thurstone using oblique factors. But we must not 
forget that if the oblique factors are not oblique enough, 
the Heywood embarrassment. will occur, and a second- 
order g be impossible. The orthogonal factors of G are 
more convenient to work with statistically, but it is possible 
that the oblique factors of V are more realistic both in our 
artificial box example and in psychology. They corre- 
sponded in our case to the actual length, breadth, and 
depth of the boxes. The factors 2, 8, and 8 of matrix G 
correspond to these dimensions after the boxes have all 
been equalized in “ size.” 


PART IV 
THE ESTIMATION OF FACTORS * 


* This use of the word “estimation” has been criticized. By 
statisticians the word is restricted to mean the estimation of un- 
known parameters from a sample, a process of inference from 
sample to parent population. Here the word is used to mean the 
“estimation” of a man’s scores in a test (or vocation or examina- 
tion) to which he has not been subjected, from a knowledge of his 
behaviour in other tests. Factors are imaginary tests and a man’s 
score in them can be “estimated ” in the same way. I would use 
another word if I could, but “estimation ” seems the natural ex- 
pression. Besides, I think the two meanings are fundamentally 
alike. 

F.A—T 


CHAPTER XIV 
REGRESSION AND MULTIPLE CORRELATION 


1. Correlation coefficient as estimation coefficient—A. corre- 
lation coefficient indicates the degree of resemblance 
between two lists of marks : and therefore it also indicates 
the confidence with which we can estimate a man’s position 
in one such list x if we know his position in the other y. 
If the correlation between two lists is perfect (r =1), 
we know that his standardized score* in the one list is 
exactly the same as in the other (a = y). 

If the correlation between the two lists is zero (r = 0), 
then the knowledge of a man’s position in the one list tells 
us nothing whatever about his position in the other list. 
If we are compelled to make an estimate of that, we can 
only fall back on our knowledge that most men are near 
the average and few men are very good or very bad in any 
quality. We have, therefore, most chance of being correct 
if we guess that this man is average in the unknown test. 
(æ =0. The average mark we have agreed to call zero ; 
marks above average, positive; marks below average, 
negative.) 

In the first case, when 7 = 1, we are justified in equating 
his unknown score æ to his known score y— 

æ =y 

In the second case, when r = 0, we are compelled by our 

ignorance to take refuge in— 
æ = 0 or average. 
Both these statements can be summed up in the one 


statement— é=ry 


where the circumflex mark over the æ is meant to indicate 
that this is an estimated, not a measured, value. If, now, 

* A test score in what follows always means a standardized score 
unless the contrary is stated. But estimates are not in standard 


measure in general, 
195 
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we consider a case between these, where the correlation is 
neither perfect nor zero, it can be shown that this equation 
still holds, provided each score is measured in standard 
deviation units. Since r is always a fraction, this means 
that we always estimate his unknown æ score as being 
nearer the average than his known y score. That is 
because we know that men tend to be average men. If 
this man’s y score is high, say— 
y=2 

(two standard deviations above the average), and if the 
correlation between the qualities æ and y is known to be 
r = -5, we guess his position in the @ test as being— 

&o=ry=5xX2=1 
i.e. only one standard deviation above the average. This 
is a guess influenced by our two pieces of knowledge, 
(1) that he did very well in Test y, which is correlated with 
Test x, and (2) that most men get round about an average 
score (zero). It is a compromise, an estimate. It will 
often be wrong; indeed, very seldom will it be exactly 
right. But it will be right on the average, it will as often 
be an underestimate as an overestimate, in each array 
of men who are alike in y. The correlation coefficient, 
then, is an estimation coefficient for tests measured in 
standard deviation units. 

2. Three tests—Suppose now that we have three tests 
whose intercorrelations are known, and that a man’s scores 
on two of them, y and z, are known. We wish to estimate 
what his score will most probably be in the other test, a. 
æ need not be a test in the ordinary sense of the word, but 
may be an occupation for which the man is a candidate 
or entrant. According as we use his known y or his 
known z score, we shall have two estimates for his w score. 
To fix our ideas, let us take definite values for the correla- 
tions, say : 


a 
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The two estimates for his @ are then— 


a ty 


A re 
v= zZ 


and of these we shall have rather more confidence in the 
estimate associated with the higher correlation. But we 
ought to have still more confidence in an estimate derived 
from both y and z. Such an estimate could use not only 
the knowledge that y and z are correlated with a, but also 
the knowledge that they are correlated to an extent of 
r =-8 with each other. Just to take the average of the 
above two separate estimates will not utilize this knowledge, 
nor will it utilize the fact that the estimate from y (r = -7) 
is more worthy of confidence than the estimate from 
a(t) 

What we want is to know how to combine the two scores 
y and z into a weighted total 


(by + c2) 


which will have.the highest possible correlation with a. 
Such a correlation of a best-weighted total with another 
test is called a multiple correlation. From such a weighted 
total of his two known scores we could then estimate the 
man’s æ score more accurately than from either the y or 
the z score alone. It must use all the information we have, 
including our information that y and z correlate to an 
amount r = 8. 

8. The straight sum and the pooling square —In order to 
answer this question, we shall first consider the problem 
of finding the correlation of the straight unweighted sum 
of the scores y + z with a. This is the simplest form of a 
problem to which a general answer was given by Professor 
Spearman (Spearman, 1918). 

We shall put his formula into a very simple form, which 
we may call a pooling square. In our present instance we 
want to find the correlation of y + z with æ (all of these 
being, we are assuming, measured in standard deviation 
units). We divide the matrix of correlations by lines 
separating the “criterion” æ from the “battery” y + z 
thus : 
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æ y z 
A aao i 
a i ea 
ie |i) G3 3 1:0 


In each of the quadrants of this pooling square (with 
unities in the diagonal, be it noted) we are going to form 
the sum of all the numbers, and we shall indicate these 
sums by the letters : : 


(6! B Z À 


(where C is the sum of the Cross-correlations between the; 
battery y + z and the criterion æ, which can be regarded 
as a second battery of one test only). 

Then the correlation of æ with y + z is equal to— 


Cc 
VAB 
which in our present example is— 
TF5 1:2 


VA) x (Q +3F3+F1). V26 G 
so that the battery (y + z) has a rather better correlation 
(-744) with æ than has either of its members (-7 and +5). 
From the straight sum of the man’s scores in the two tests 
y and z we can therefore in this case get a better estimate 
of his score in x than we could get from either alone. 

4. The pooling square with weights—We want, however, 
to know whether a weighted sum of y and z will give a still 
higher combined correlation with æ. With sufficient 
patience, we could answer this by trial and error, for the 
pooling square enables us to find almost as easily the 
correlation of a weighted battery with the criterion.* Let 
us, for example, try the battery 8y + 2. For this purpose 


* The pooling square can also be used to find the correlations or 
covariances of weighted batteries with one another. Elegant 
developments are Hotelling’s ideas of the most predictable criterion 
(19852) and of vector correlation (1936). 
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we write the weights along both margins of the pooling 
square : 


3 1 

| 1:0 7 5 

3 7 1:0 3 
i 5 3 1-0 


and multiply both the rows and the columns by these weights 
before forming the sums A, B, and C. The result of the 
multiplications in our case is : 


lence E 
“a an Es 1-0 2-6 
2-1 9-0 9 7 F 
9 . 
4 bs Meer 26 | 118 
and we therefore have— 
correlation = ‘757 


2-6 
VIs 
a higher value than -744 given by the simple sum. So we 
have improved our estimation of the man’s æ score, and 
estimates made by taking 3y +2 would correlate -757 
with the measured values of æ. 

5. Regression coefficients and multiple correlation.— 
Similarly we could try other weights for y and z and search 
by trial and error for the best. There is, however, a general 
answer to this question, namely that the best weights for 
y and z are proportional to certain minor determinants of 
the correlation matrix. The weight for y is proportional to 
the minor left when we cross out the criterion column and 
the y row, the weight for z is proportional to minus the 
minor left when we similarly cross out the criterion column 
and the z row. The matrix of correlations with the 
criterion column deleted being : 
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the weight for y is therefore proportional to : 


win) aa N E s 
3 1-0 = 
and that for z is proportional to : 
orf 5 
= -2 
1:0 3 2 


that is, they are as -55:-29. To make these weights not 
“merely proportional but absolute values we must divide 
each of them by the minor left when the row and column 
concerned with the “ criterion ” x are deleted, namely : 
1-0 3 
3 1:0 
so that these absolute best weights, for which the technical 


e 


= -91 


name is “ regression coefficients,” are— 
"55 +29 
—y) —z 
or! + p 

or "6044y + 81872 


We are inviting the reader to take this method of calculat- 
ing the regression.coefficients on trust ; but he can at least 
satisfy himself that when applied to the pooling square they 
give a higher correlation of battery with criterion than any 
other weights do. The result of multiplying the y column 
and row by -6044, and the z column and row by -3187, is 
the following : 

-6044 +3187 


| 


1:0000 | -4231 -1593 


OAE T 8s = L | s53 LOES 
‘3187 | 5 3 O | _ 1593 | 0578 "1015 | 
1:0000 | -5824 
© D824 | 5824 
A 5 5824 F 
Multiple correlation = 5824 ~ 763 = fm say, which 


is higher than any other weighting will produce, if the reader 
cares to try others. Notice the peculiarity of the pooling 
square with regression coefficients as weights, that C = B 
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(-5824 = -5824). We can deduce that the inner product of 
the regression coefficients with the correlation coefficients 
gives the square of the multiple correlation— 


“604 X -7 + -319 X -5 = -583 = Tm? 


m 

Indeed, we can take this as forming one reason for using 
-604 and :319, and not any other numbers proportional to 
them, although the latter would give the same order of 
merit. We want our estimates of æ not merely to be as 
highly correlated with the true values of æ as is possible, 
but also to be equal to them on the average in the long 
run, in the sense that our overestimations will, in each 
array of men who have the same y and z; be as numerous 
as our underestimations, and this is achieved by using not 
merely -55 and -29 as weights, but -55 +-91, and -29 + -91. 

6. Aitken’s method of pivotal condensation.—When there 
are more than two tests y and z in the battery, the applica- 
tion of the above rules becomes increasingly laborious. It 
is desirable, therefore, to have a routine method of calcu- 
lating regression coefficients which will give the result as 
easily as possible even in the case of a team of many tests. 
The method we shall adopt (Aitken, 1937a) is based upon 
the calculation of tetrads, as already used in our Chapter V. 
We shall first calculate the above regression coefficients 
again by this method. Delete the criterion column in the 
matrix of correlations, transfer the criterion row to the bottom, 
and write the resulting oblong matrix in the top left-hand 
corner of the sheet of calculations, preferably on paper 
ruled in squares : 


Check 

Column 
(10) 3 | =i f 3 
Al «38 710 ej “ 
he ite 5 | 1-2 
B | (91) | 3 =i | 21 


-604 319 | -923 
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On the right of the oblong matrix of correlation coeffi- 
cients we rule a middle block of columns of the same 
number, here two, and on the right of all a check column. 
The columns of the middle block we fill with a pattern 
of minus ones diagonally as shown, leaving the other cells 
empty,* including the bottom row. In the check column 
we write the sum of each row. The top left-hand number 
of all we mark as the “ pivot.” Slab B of the calculation 
is then formed from slab A by writing down, in order as 
they come, all the tetrad-differences of which the pivot in 
A is one corner. Thus the first row of slab B is calculated 
thus— 


1x i —:8 x fy -91 
Se 0 — 3 x (—1)= 3 
1 x (—1)— -8 x o=-1 
ToX 83 — 83 Xx 3 = "21 


and the row is checked by noting that -21 is the sum of the 
others. Immediately below this first row a second version 
of it is written, with every member divided by the first 
(91). This is to facilitate the calculation of slab C by 
having unity again as a pivot. The second row of slab B is 
then formed, beginning with— 


Lx -5 — -7 X -3 = +29 


Throughout the whole calculation, except for the division 
of the first row, only one operation needs to be performed, 
namely the computing of tetrad-differences, beginning with 
the pivot. 

The same operation is then repeated to give slab C, 
using the modified first row of B, with pivot unity. 

This procedure goes on, slab after slab, until no numbers 
remain in the left-hand block. There being only three 
tests in all in our example, this happens at slab C. The 
middle block then gives the regression coefficients -604 and 
‘319, with their proper signs, all ready for use. Throughout 
the calculation the check column detects any blunder in 
each row. The check, let me repeat, for I often find this 
misunderstood, consists in seeing that the appropriate 


* The dots represent zeros. 
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tetrad from the sums in the previous slab agrees with the 
sum of the new row. Thus -99 is both the sum of its row, 
and also the tetrad— 


n 1X 1:2 = TX 3 
from slab A. 

When the number of tests in the battery is large, the 
calculation of the regression coefficients is a laborious 
business, but probably less so by this method than by 
any other. It will be clear to the reader that so long a 
calculation is not worth performing unless the accuracy of 
the original correlation coefficients is high. Only very 
accurate values can stand such repeated multiplication, 
etc., without giving untrustworthy results. In other 
words, regression coefficients have a rather high standard 
error.* 

7. A larger ecample.—Next we give in full the calculation 
of the regression coefficients in a slightly larger example, 
though one still much smaller than a practical scheme of 
vocational advice would involve. Here 29 is the “ occu- 
pation,” and 2%, Z» 23, and 2, are tests. To give the 
example an air of reality, these and their intercorrelations 
are taken from Dr. W. P. Alexander’s experimental study, 
Intelligence, Concrete and Abstract (Alexander, 1985). 
They were f: 


R 


ro) 


Stanford-Binet test ; 

A picture-completion test ; 

Thorndike reading test ; 

Spearman’s analogies test in geometrical figures. 


Ù & Ry 


ro 


* Regression weights obtained from one set of data, applied to a 
subsequent set, will not usually give a correlation with the criterion 
as high as that predicted. The probable defect in its square will 
be (Wherry, 1981)— 

a — eM — 1)/(.N — M), 
where N is the number of persons and M the number of tests. 

t In this, as in other instances where data for small examples are 
taken from experimental papers, neither criticism nor comment is 
in any way intended. Illustrations are restricted to few tests for 
economy of space and clearness of exposition, but in the experiments 
from which the data are taken many more tests are employed, and 
the purpose may be quite different from that of this book. 
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But the occupation is a pure invention, for purposes of this 
illustration only. The correlation matrix is : 


Zo Z Be Za Z4 
Z |1.00 -72 -68  -58 -41 
A 72 1:00 -39 69 -49 
Lo 63 -39 1-00 19-27 
z3 58 -69 -19 100 -38 
za 4l -49 27 88 1:00 


The fact that we possess these correlations means that we 
have given these tests to a sufficiently large number of 
persons whose ability in the occupation is also known. 
The occupation can be looked upon as another test, in 
which marks can be scored. In an actual experiment, 
obtaining marks for these persons’ abilities in the occupa- 
tion is in fact one of the most difficult parts of the work. 
We can now find by Aitken’s method the best weights for 
Tests z to 2, to make their weighted sum correlate as 
highly as possible with zọ For a reason which will be 
explained later, I have numbered the tests in the order of 
their correlations with the criterion. To make the arith- 
metic as easy as possible to follow in an illustration, the 
original correlation coefficients are given to two places of 
decimals only, and only three places of decimals are kept 
at each stage of the calculation. The previous explanation 
ought to enable the reader to follow. As an additional 
help, take the explanation of the value -454 in the middle of 
slab C. It is obtained thus from slab B— 


1 X -490 — -079 X -460 = -454 


and is typical of all the others. Except for the division 
of each first row, only one kind of operation is required 
through the whole calculation, which becomes quite 
mechanical. The numbers shown on the left in brackets 
are the reciprocals of -848, -517, +748, used as multipliers 
instead of dividing by the latter numbers, in obtaining the 
modified first rows. The process continues until the left- 
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hand block is empty, when the regression coefficients 
appear in the middle block.* 

The result is that we find that the best prediction of a 
man’s probable success in this occupation is given by the 
regression equation— 

o = -890% + -481z, + -2222, + -0182, 
We give a candidate the four tests, reduce his scores 


COMPUTATION OF REGRESSION COEFFICIENTS 
Aitken’s Modified Method with Each Pivot converted to Unity 


Check 
(1) -89° 69 -49 |—1 4 ‘ x 1:57 
SOME o:19) 227.0 mee =1 : 5 -85 
4| 69 19 1 -38 |: : =i ; 1-26 
Aa AT 838 aal 3 : -1 114 
fg2ee eS) 58 ALAN x : ; 2:34 
(1-179) (848) —079 +079, 390 —1 4 3 238 
1:000 —:093 -093) -460 —1179 . > +281 
ip —079 524 -042| -690 . —1 : 177 
079 -042 760| 490 «. : —1 “371 
349 083 -057| [720]. . is 5 1:209 
(1-986) (517) 049| -726 —-098 —1 3 “199 
1:000 -096| 1:406 —-180 —1:936 . “386 
E "049 753| -454 093 —1 “349 
‘116 . 025) =[-559| [-412| . : 1-112 
sul á we | —— 
(1-387) (748) -384 J02 -095 —1 “329 
1-000] -514 136 128 —1-337| -441 
D “014 | -397| [483] [224] . 1-068 
z | -390 +431 +222 ‘018 | 1-061 
| Final Regression Coefficients 


* The product of all the unconverted pivots, 1 X -848 x -517 x 
«748, is the value -328 of the determinant : 
100 -39 -69 -49 

-89 1-00 -19 27 

-69 -19 1-00 -88 

| 49 -27 -38 1-00 

If this alone were wanted, the middle block, and the criterion row, 
would, of course, be unnecessary. 
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to standard measure by dividing by the known standard 
deviation of each test, insert these standard scores into 
this equation, and obtain an estimated score for him in 
the occupation. Thus the following three young men 
could be placed in their probable order of efficiency in this 
occupation from their test scores : 


i} 
| Standard Scores in | 


zı Za Z3 Za o 
Tom Grá 0 2) 1 — 5 “B1 
Dick — 4+ — +8 1 3 — 4T 
Harry | 2 1:3 8 6 “83 


The multiple correlation of such estimates ź with the 
true values would be obtained by inserting the four 
correlation coefficients— 

12 63 58 “Ad 
instead of the z’s in the regression equation, and taking 
the square root, thus— 

890 X -72 + -431 X -63 + -222 x -58 + -018 x -41 
-68847 = rp? 

So 
Finally, we can, as we did in the former example, use 


the regression weights on a pooling square and see if we 
obtain this same multiple correlation of Tm = 88): 


Il 


-890 "431 222 018 


1-00 72 e sh al 
Ene SEs N 

390 A | 1-00 39 -69 -49 | 

‘431| -63 89 100, 19 -27 | 

‘222 | -58 69 19 100 -38 | 
018 | -41 ‘49 -27 -38 1-00 


It will be remembered that we have to multiply each 
row and column by its appropriate weight, and then sum 
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all the numbers in each quadrant. The easiest way of 

doing this in large pooling squares is to multiply the rows 

first, then add the columns and multiply the totals by the 

column weights, finally adding these products, thus : 
Multiply the rows : 


-390 -431 -222 -018 
1:0000 72 -63 -58 -41 
-2808 «3900 -1521 -2691 -1911 
2715 ‘1681 +4310 -0819 1164 | 
‘1288 | -1582 -0422 22290 -0844 | 
0074 | -0088 -0049 -0068 -0180 | 
Sums -6885 7201 -6302 -5798 4093 | 


If we had kept all decimals these columnar sums would, 
since we are using regression coefficients as weights, have 
been exactly equal to the top row. With the actual figures 
shown, on multiplying the column totals and adding them, 
we find that the pooling square condenses to : 


1:0000 "6885 
-6885 -6885 


-6885 
T, = = 
m 4/6885 


8. Using fewer tests.—There is a tendency, which com- 
mon sense finds natural, for the regression coefficients of the 
tests of a battery to be in the same order of magnitude as 
their correlations with the criterion. But this is not in- 
variably the case, and in the present example, if we com- 
pare the two sets— 


= :83 as before. 


correlations with criterion -72 -63 -58 “A 
and regression coefficients -390 -431 -222 -018, 


we see that Test 2 has a higher regression coefficient than 
Test 1, although a lower criterion correlation. The reason 
lies in the high correlation of Test 1 with Test 3,.-69. 
They measure to that extent the same thing, and when 
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Test 3 is introduced into the battery it begins to some extent 
to put Test 1’s “ nose out of joint.” 

The boxed numbers in the calculation on page 205 are 
all regression coefficients. If only Test 1 is used, its 
regression coefficient is -72. If Tests 1 and 2 are used, 
their regression coefficients are -559 and -412. If Tests 1, 
2, and 3 are used, their regression coefficients are -397, 
433, and -224, And if all four Tests are used, the four 
final numbers are the regression coefficients. 

The addition of each test raises the multiple correlation 
Tmax: We have— 


Bee, 

Test 1 72 X +72 = +5184 

Tests 1 and 2 72 X -559 + -63 x -412 = 6622 

Tests 1, 2,and 3 -72 x -897 + -63 x -433 + -58 x +224 = -6882 
Tests 1,2, 3,and4 -72 x -390 + -68 x -431 + -58 x -222 

+ -41 x -018 = -6885 


Although the addition of each test raises the multiple 
correlation, some do so only very little; and our caution 
in ordering the tests in accordance with the magnitude of 
the criterion correlation makes it probable, though not 
certain, that the comparatively useless tests will be the later 
ones. We can at each stage of the calculation pause and see 
whether the test we have just added makes a significant 
addition to the multiple correlation. We do this by an 
analysis of variance (see Lindquist, 1940, Chapter V, or 
other text-book). Consider, for example, the rise in the 
squared multiple correlation from -6622 to -6882. Is the 
rise statistically significant ? To decide this we must know 
the number of persons tested, say N = 105. 


= ah + 
a Degrees of | Mean | F 
Test | | 
ed T maz | Freedom | Square | eae ES 
l and 2 - | +6622 | 2 | | 
Increment on add- | | | 
ing3 á . - | 0260 1 | -0260 | -0260+-0031=7-7 
Residue . ./ -3118| 101 ‘| -ooa1 | 
| 
| [ a 


Total + | 1:0000 [N—1=104 


REGRESSION AND MULTIPLE CORRELATION 209 


The calculation is carried out in the above form, and the 
decision whether the increment of Tèmas is statistically 
significant depends on the size of the ratio F. If it is large 
enough, the increase is significant. To decide how large, 
consult Table V in Fisher and Yates’s Statistical Tables, 
where we find that, with degrees of freedom 1 and 101, a 
ratio of 6-88 would be significant at the 1 per cent. point, 
i.e. quite highly significant, and 7-7 is even larger than this. 
So the increase due to the addition of Test 3 is well worth 
while. A similar calculation for the further addition of 
Test 4, producing a rise of -0003 in 7?,,,,,, shows, as might be 
expected, that this is not significant, for F is now less than 
unity, and Tests 1, 2, and 3 are (with 105 cases) as good 
as the whole battery. 


T = ] al | 
F Degrees of | Mean 3 
esis | T maz Freedom | Square eI 8 

g -aai Lj = | 
1,2,and3 . | -6882 | sani | 
Increment on add- | | | | 

ing 4 . - 0003 | 1 | -0003 | Less than unity. 
Residue . . | 8115} 100 0031 | 
| | 

Total . + | 1-0000 | 104 | 


9. Calculation of a reciprocal matriz.—A somewhat longer 
method of calculating regression coefficients has two 
advantages : it permits the easy calculation of regression 
coefficients for any criterion (or many) when once the main 
part of the computation is completed, and, what is of great - 
importance, it enables the standard errors of the coefficients, 
and of their differences, to be found quickly. 

The method referred to is to find first of all the reci- 
procal of the matrix of correlations of the tests. This is 
done by pivotal condensation also, as illustrated in the 
table overleaf. The matrix whose reciprocal is required 
appears in the top left-hand corner, with a diagonal array 
of minus ones on its right, and a diagonal of plus ones 
below it. The whole is condensed in the manner already 
described on page 205, and the required reciprocal matrix 
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appears at the bottom. It can be checked by multiplying 
together the original matrix and its reciprocal, which 
ought to give a matrix with ones in the diagonal and zeros 
everywhere else, the unit matrix as it is called. 


1 39 -69 
-39 1 19 
69 19 1 
-49 -27 -38 
yo 
1 . 
1 
“848 —-079 
1-000 —-093 
— 079 “524 
‘079 “042 
—-390 —-690 
1 ; 
1 
“BIT 
1-000 
049 
—726 
"093 
1 


=i A 
=r A 
=i A 
= 
390 —1 
460 —1-179 : 
-690 =il : 
“490 =i 
1 
1726. = -003; “i 
1:406 —-180 —1-936 ; 
454 -093 al 
1179 —-460 
—460 1-179 
102-095 1 
136-128 —1-337 
—591 —1-406 
1196 -180 
180 1:936 
2:397 —:539-—1:357 —:513 
—539 1:210 -193 —-136| 
—1:357 193 1:948 —-128 
—:514 —-:136 —-128 1:337 


coe 19 


me eet OR 


The reader will see that space could be saved in the 
calculation by omitting the rows containing ones only ; 
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and also that nearly half the numbers can be written down 
from symmetry. 

The regression coefficients for any criterion are then 
obtained by multiplying the rows of the reciprocal by the 
criterion correlations and then adding the columns. In 
the example of page 205 we multiply the first row of the 
reciprocal by -72, the second by -63, and so on. The 
addition of the columns then gives the same regression 
- coefficients as were found on page 205. 

10. Variances and covariances of regression coefficients— 
The most important advantage of this method is that 
whatever the criterion, the variances and covariances of the 
regression coefficients are proportional to the cells of the 
above reciprocal matrix (Fisher, 1925, 15 and 1922, 611). 
This enables their absolute values for any given criterion 
to be obtained by multiplying by 1 — r2,, (the defect of 
the square of the multiple correlation from unity), and 
dividing by the number of “degrees of freedom ” which 
is for full correlations N — p — 1 where N is the number 
of persons tested, and p the number of tests. For partial 
correlations the degrees of freedom are reduced by the 
number of variables “ partialled out.” 

Thus in our example, where p = 4, if N had been 105, 
N — p — 1 would be 100. The multiple correlation was 
83, and 1 — r?,, = -312 (see page 206). The variances and 
covariances of our four regression coefficients are in this 
case equal to the reciprocal matrix multiplied by -00312. 

0075 —-0017 —-0042 —-0016 
—-0017 -0038 0006 —-0004 
—-0042 -0006 -0061 — -0004 
—:0016 —-0004 —-0004 0042 


The standard errors of the regression coefficients are the 
square roots of the diagonal elements : 
Regression coefficients +390 -431 -222 -018 
Standard errors 087 -062 -078 -065 
Significant ? Yes Yes No 
The correlations of the regression coefficients will be got 


by dividing each row and column by the square root of 
the diagonal element. We obtain : 
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1:00 —-31 —-62 —-28 
—'31 1-00 ‘12 —-10 
—-62 12 1:00 —-08 
—-28 —-10 —-08 1-00 


We can now calculate the standard error of the difference 
between any pair of the regression coefficients and sce 
whether they differ significantly. Take, for example, those 
for Test 1 (-390) and Test 2 (-431). The difference is -041. 
Its standard error is the square root of 
0075 + -0088 + 2 x -831 x -087 x 062 = 0146 
«<. standard error of -041 is -121 
The difference is therefore not significant when N = 105. 
Had N been larger it might have been. 

11. The geometrical picture of regression.—Before we close 
this chapter it will be illuminating to consider what re- 
gression and estimation mean in terms of the geometrical 
picture of Chapter VI. Consider the illustration used in 
the earlier pages of the present chapter; with the matrix : 


ie z 
anmo 7 +5 
ae || ah Ta AR 
Seal (See E PG, 


Here x is the criterion, y and z are the tests. Each of 
them can be represented by a directed line, as explained in 
Chapter VI, with angles between these lines such that 
their cosines are the above correlations. The three lines 
will then be in an ordinary space of three dimensions. 

The two tests y and z themselves have, of course, lines 
which lie in a plane: any two lines springing from the 
same point as origin lie in a plane. The criterion line x 
is not in this plane (say, the table top, on which we may 
imagine lines y and z to lie), but makes an angle with it. 
The problem of regression and multiple correlation is, in 
terms of this geometrical picture, to find the line in the plane 
of y and z which makes the smallest possible angle with the 
line a: for the smallest possible angle corresponds to the 
largest possible correlation. Clearly this desired line is the 
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line which is the projection of the line æ on to the yz plane, 
the shadow thrown by æ on the table with the sun right 
overhead. In Figure 29 it is the line OB, where B is verti- 
cally below a point 4 on the test line z. 

The regression coefficients are numbers which express 
the proportions in which the tests y and z have to be com- 
bined to give this line OB. Tt is just like the parallelogram 


Figure 29, 


of forces. If from B we draw parallels to the two test lines, 
we obtain OY and OZ as the distances to be measured along 
the two test lines to give a resultant along OB, which is as 
near as we can come to OA. (No combination of y and z 
can give a line out of their plane.) If the distance OA is 
taken as unity, the distances OY and OZ are the actual 
regression coefficients. If a wire model like Figure 29 were 
made with the proper angles with cosines œ with y equal to 
“7, œ with z equal to -5, and y with z equal to -3, the distances 
OY and OZ would be found to be -6044 and -3187. And 
the cosine of the angle BOA would be -763, the value we 
found for the multiple, or highest possible, correlation of 
the two-test battery with a in Section 5 of this chapter, 
page 200. 

12. Estimation the same as projection —Let us now con- 
sider a man P whose two scores in the Tests y and z we 
know, and whose probable score in Test a we wish to 
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estimate. His two scores OM and ON in y and z enable 
us to assign to this man a point P on the yz plane, a point 
so chosen that its projections on to the y and z vectors 
give the scores made by him in those tests (see Figure 30). 
But we cannot say that this is his point in the three-dimensional 
space of x, y, and z. His point in that space may be any- 
where on a line P'PP” at right angles to the plane yz. For 


Figure 30. 


from anywhere on that line, projections on to y and z fall 
on the points M and N. Yet the projection on to the 
vector x, which gives his score in the criterion test a, 
depends very much on the position of his point on the line 
P’PP". All the people represented by points on that line 
have the same scores in y and z but different scores in x, 
and our man may be any one of them. Before deciding 
what to do in these circumstances, let us consider this set 
of people P’PP” in more detail. 

Tt will be remembered that the whole population of 
persons is represented by a spherical swarm of points, 
crowded together most closely round about the origin O, 
and falling off in density equally in all directions from 
that point. Every test line is a diameter of this sphere, 
and the plane containing any two test vectors divides the 
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spherical swarm into equal hemispheres. It follows that 
a line like P'PP” is a chord of the sphere at right angles to 
a diameter (the line OP), and consequently that it is 
peopled symmetrically on both sides of P, both upwards 
along PP’ in our figure, and downwards along PP”, the 
men on the line being most crowded near the point P itself. 
The average man of the array of men P'PP” (who are all 
alike in their scores in the two tests y and z) is therefore 
the man at P, and since we do not know exactly where 
our candidate’s point is along P’PP”, we take refuge in 
guessing that he is the average man of his group and is at 
the point P itself. From P, therefore, we drop a perpen- 
dicular on to the vector æ, and take the distance OL as 
representing his estimated score in that test. This geo- 
metrical procedure corresponds exactly to the calculation 
we made, as a little solid trigonometry will show the 
mathematical reader. The non-mathematical reader must 
take it on trust, but the model may illuminate the calcula- 
tion. OL is the average of all the different scores æ that a 
person with scores OM and ON can have. The estimate 
will only be certain if the line æ itself is on the table; it 
will be less and less certain, the more the line @ is inclined 
to the table. 

It should be noted that the angles which three test 
vectors make with each other are impossible angles, if the 
determinant of the matrix of correlations becomes negative. 
Ordinarily, that determinant is positive. In our present 
example we have, for example : 


1-0 7 
7 1:0 -8 |=-88 
5 3 10 | 


Such a determinant, however, though it cannot be 
negative, can be zero, namely in the cases where the two 
smaller angles exactly equal the largest. In that case the 
three vectors lie in one plane—the criterion line has 
sunk until it too lies on the table. In that case alone, 
when the determinant is zero, the “ estimation ” is certain, 
and all the people in the line P’PP” have not only the same 
scores in y and z, but also the same scores in æ. The 
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vanishing of the above determinant therefore shows that 
this is so. And in more than three dimensions, although 
we can no longer make a model, the vanishing of the 
determinant : 


1 Tor Toz Tos : Ton | 

Tor 1 Tiz nt x, Tin 

T Tyo 1 iF . To, 

‘02 ` 3 a "2n it = A, say, 
Tog Tis Teg . Tan | 

Ton Tin Ton T3n s 1 | 


shows that the criterion zo can be exactly estimated from 
the team %, z ... Z% In fact, the multiple correlation 
Tm Which we have already learned to calculate in another 
way, can also be calculated as— 


= A 
Ta ee 
4oo 


where A is the whole determinant, and Ago is the minor 
left after deleting the criterion row and column. This 
expression clearly becomes equal to unity when A = 0. 
In our small example a, y, z, we have— 


A =-38 Ag =-91 


38 53 
i Af = 5824 = - 
PS J 01 NES 4/:58 763 


as we already know it to be from page 200. 

13. The “ centroid” method and the pooling square.—The 
pooling square, which we have learned to use in this 
chapter, enables us to see in another light the nature of the 
factors first arrived at by the “ centroid ” method. 


Equal Weights 


| 3 zı Be Za Za 
Zi I 1 Tiz Tiz Tia 
RAEI dl 1 The Tis Tia 
Equal Ze FÑ Tiz 1 Tas Tag 
Weights z | Ta Tis Tis 1 Tga 
ns all 
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Let us suppose that the tests x, Z» 23, and z, have the 
correlations shown, and let us by the aid of a pooling square 
find the correlation of each of them with the average of all. 
This means giving each test an equal weight in pooling it. 

The correlation of z with the average of all is then 
obtained from the above pooling square (see previous page), 
which condenses to : 


1 | l+ re +T + ra 


of the table of corre- 


| 

| 

ea S ll ll | 
+ tp | um of all the cells | 
| | 

| lations. | 


and the correlation coefficient is— 


Ue its oh Tia 
Vabove sum 


This, however, ts exactly the centroid or simple sum- 
mation process applied to a table with full communalities of 
unity. The first centroid factor obtained from such a table 
is simply for each individual the average of his four test 
scores, and the method is called the “ centroid ” method, 
because “ centroid ” is the multi-dimensional name for an 
average (Vectors, Chapter III; and see Kelley, 1985, 59). 
The line in our geometrical picture, which represents the 
first centroid factor, is in the midst of the radiating lines 
which represent the tests, like the stick of a half-opened 
umbrella among the ribs. It does not, however, make 
equal angles with the test lines unless these all make 
equal angles with each other. If several of them are 
clustered together, and the others spread more widely, 
the factor will lean nearer to the cluster. 

In the foregoing explanation the communalities have 
been taken as unity, and the factor axis was pictured in 
the midst of the test lines. If smaller communalities 
are used, the only difference is that a specific component 
of each test is discarded, and the first-factor axis must be 
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pictured as in the midst of the lines representing the other 
components of the tests. It can be shown that when 
communalities less than unity are used, if we bear in mind 
that the communal components of the tests are not then 
standardized, the pooling square gives the correlations 
exactly as before, if we use communalities instead of units 
in the diagonal. 

The first centroid factor is the average of the communal 
parts of the tests. 

The later factors in their turn are, in a sense, averages 
of the residues. There are, however, some complications, 
the first being that the average of the residues just as they 
stand is zero. The manner in which Thurstone circum- 
vents this has already been described in Chapter V. 

14. The most predictable criterion Often a criterion is 
also composed of parts, just as a battery of tests is. If it is 
success in an occupation, the journeyman may be judged 
for skill, for regularity of attendance, for his manner in 
dealing with colleagues or customers, ete. Some of these 
items will consciously or unconsciously be weighted more 
heavily than others in an adjudicator’s assessment of the 
man ; and so too in the assessment of a boy’s success in a 
secondary school. If the weights are thus decided by 
employer, or by headmaster, the criterion score becomes 
again one number, the sum of the arbitrarily weighted 
parts. 

Hotelling, however, raised and solved the question of 
how to weight the parts of a criterion so that it would 
correlate most highly with a given battery of tests, also 
weighted in its best way (Hotelling, 1935a, and see Thomson 
1947, 1948, and M. S. Bartlett, 1948). There are, then, 
indeed two weighted batteries. In terms of our gceo- 
metrical analogy, the criterion is now no longer a line, as in 
Figures 29 and 30, but a space, and the problem is to find 
a line in the criterion space, and one in the battery space, 
which will be as near to each other as possible, both spring- 
ing from an origin O common to both spaces. This tech- 
nique, which the reader will find illustrated by an 
arithmetical example in Thomson (1947, 1948), would, for 
instance, enable weights to be given to the tests in two 
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different batteries to make these agree with one another as 
much as possible. 

15. Weighting for battery reliability—A special case 
arises when the two batteries are composed of alternative 
forms of the same tests, when the correlation between the 
two batteries is the battery reliability, which can be 
enhanced by suitable weighting. 

Thomson (1940) described how to find the best weights 
for battery reliability, as a special case of Hotelling’s 
“most predictable criterion,” and Peel (1947) has given a 
simpler formula than Thomson’s (see page 353 in the 
Mathematical Appendix, Section 9a): If there are only 
two tests in the battery, with reliabilities 74, 1. and 
correlating with one another 74., then Peel’s formula gives 
as the maximum attainable reliability the largest root u of 
the equation. 


that is p?(1 — 749°) = (My + Te — 2r?) + (Ta? — M9") = 
If, for example, 7. = +5, 7, = -7, and ry, = ‘8, the quadratic 
has roots :843 and -490, and a battery reliability of -843 
is attainable by using weights proportional to either row of 
the above determinant with » = -843, taken reversed and 
with alternate signs, that is 0785 and -1431 

or -0431 and -0785 

or 1 and 1:8 approximately. 
If as a check we set out a pooling square for the two bat- 
teries it will be— 


Tees 1 18 
te Se Soe ee a arene 
LB S i 5 ae 
Ty he erg bee] Oo os 
18 5 BATA hE eg ba 


and if we multiply the rows and columns by the weights 
shown, and add together the quadrants, this reduces to— 
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6-04 | 5-092 
5:092 | 6-04. 


giving a battery self-correlation or reliability of— 
5-092 
“6-04 

When there are more than two tests, the solution of the 
above determinantal equation becomes laborious and diffi- 
cult. Green (1950) has given a transformation of the 
equation which enables an iterative process to be used in 
its solution, making it more practicable (see the Mathe- 
matical Appendix, page 353). 

Clearly the weights making a battery as reliable as 
possible will not be the same as those making it most valid 
in predicting a given criterion. There is here a conflict 
of aims, for we want a battery to be both as valid and as 
reliable as possible. It is very desirable that some reason- 
ably simple form of calculation should be devised to find 
those weights which should be given to the tests of a battery 
which, for a given criterion, would make the best com- 
promise, making reliability equal to validity and both as 
great as possible (see Thomson 1940, pages 364 to 365) 


= -843 as expected. 


CHAPTER XV 
THE ESTIMATION OF A MAN’S FACTORS 


1. Estimating a man’s “ g.”—So far, our discussion of 
estimation in Chapter XIV has had nothing immediate to 
do with factorial analysis. We are next, however, going 
to apply these principles of estimation to the problem of 
estimating a man’s factors, given his test scores. As we 
have already explained in Chapter VII, there is no need to 
“estimate” factors when unity is retained in each diagonal 
cell; they can be calculated without any loss of exactness 
because they are equal in number to the tests: and even 
if we analyse gut only a few of them, they can be exactly 
calculated for a man from his test scores. When we say 
exactly here, we mean that the factors are known with the 
same exactness as the test scores which are our data. 

When communalities are used, however, factors are 
more numerous than the tests, and can therefore only be 
“ estimated.” Two men with the same set of test scores 
may have different factors. All we can do is to estimate 
them, and since the test scores of the two men are the 
same, our estimates of their most probable factors will 
be the same. The problem does not differ essentially 
from the estimation of occupational success or of ability in 
any “criterion ” test. ‘The loadings of a factor in each 
test give the z row and column of the correlation matrix. 
Let us first consider the case of a hierarchical battery of 
tests, and the estimation of g, taking for our example 
the first four tests of the Spearman battery used as illustra- 
tion in Chapter I, with these correlations : 


3 Z2 Z3 Z4 


Z4 "54 48 42 1-00 
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These correspond, in the analogy with the ordinary cases 
of estimation of the first chapter of this part, to the tests 
given to a candidate. In those cases, however, there was 
a real criterion whose correlations with the team of tests 
were known, and formed the z) row and column of the 
matrix. Here the “criterion” is g, and it cannot be 
measured directly ; it can only be estimated in the manner 
we are now about to describe. We have here, therefore, 
no row and column of experimentally measured correlations 
for the criterion zọ or g in the present case (Thomson, 
B.J.P. 25, 94). From the hierarchical matrix of inter- 
correlations of the tests, however, we can calculate the 
“saturation ” or “ loading ”? of each test with the hypo- 
thetical g, and use these for our criterion column and row 
of correlations. These saturations are the correlation co- 
efficients which would be found between each test and a test 
of pure g with no specific. We thus arrive at the matrix : 


Zo % Z2 23 Z4 


zo | 100 -90 80 -70 -60 
z | 90 1:00 -72 68 54 
za | 80 -72 100 “56 -48 
z | 70 68 56 1:00 -42 
za | 60 54  -48 -42 1-00 


and we want to know the best-weighted combination of - 


the test scores z% to z4 in order to correlate most highly 
with z =g. The problem is now the same as one of 
ordinary estimation of ability in an occupation, and the 
mathematical answer is the same. We can, for example, 
use Aitken’s method of finding the regression coefficients, 
although in this case, because of the hierarchical qualities 
of the matrix, there is, as we shall shortly see, an easier 
method. It is, however, illuminating for the student 
actually to work out the regression coefficients as in an 
ordinary case of estimation, as shown on the next page. 

If, therefore, we know the scores Z Z» %q, and 2, which 
a man has made in these four tests, we can estimate his g 
by the equation— 


ê = 55812, + "25952, + -1602z, + -1095z, 
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(1:00) -72 -63 -54 |—1-00 5 1:89 

‘72, 1-00 -56 -48 —1-00 5 1-76 

- -56 100 -42 : —1-00 . PE 

54 48" -42 1-00 y a E R A 

90 -80 70 -60 3-00 
(2:0764)(-4816) -1064 -0912| -72 —1-00 3992 


(-5796) -0596 


“4709 2209 —1-00 -3311 

1:0000 -1028| 8124 3811 —1:7253 : -5712 

0597 6911| -4037 -1894 —1:00 | -3438 

0994 -0852| -6728 3156 1-1730 

(1-4599) (6850) -3552 1666 1030 —1-00 | -3097 
1:0000 5186 2432 -1504 —1-4599| +4521 

0750| -5920 2777 1715 1-1162 

-5581 -2595 -1602 -1095 | 1-0823 


Regression Coefficients 


The multiple cořrelation of such estimates in a large 


number of cases with the true values of g will be by analogy 
with our former case given by— 
Tm? = -5581 X ‘90 + -2595 x -80 

+ -1602 xX -70 + -1095 x -60 = -883 
Ta = +940 
We must remember, however, that such a correlation here 
is rather a fiction. We had in the former case the possi- 
bility of comparing our estimates with the candidate’s 
eventual performance in the occupation or criterion zo. 
Here we have no way of knowing g; we only have the 
estimates. 

As before, we can check the whole calculation by a 
pooling square (see page 200). 

Estimating g from a hierarchical battery is therefore, 
mathematically, exactly the same problem as estimating 
any criterion, and can be done arithmetically in the same 
way. Because of the special nature of the hierarchical 
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matrix of correlations, however, with its zero tetrad- 
differences, there is an easier way of calculating the estimate 
of g, due to Professor Spearman himself (Abilities, xviii). 
For its equivalence mathematically to the above see 
Appendix, paragraph 10. 

Meanwhile we shall illustrate it by an example which 
will at least show that it is equivalent in this instance. 
The calculation is best carried out in tabular form, and is 
based entirely on the saturations or loadings of the tests 
with g, which are also their correlations with g. 


Regression 

a en Tig Coefficients 

Test| Tu Tay |I— Teal = an ETE y Ty 

| Cee 
1 | -9 | -81 19 | 4-2632 | 4-7368 | 5533 
2 8 64 +36 1:7778 | 2:2222 | 2596 
3 Y 49 ‘51 | -9608 | 1:3725 | 1603 
4 | 6 36 | -64 | -5625 9375 | 1095 
S = 7:5643 


TAGS 


The result, with much less calculation, is the same. 
The quantity S is of some importance in this formula. It 
is formed in the fourth column of the table, from which 
it will be seen that— 

Tig 2 = Tig 
ee a a AE 

It is clear that § will become larger and larger as the 
number of tests is increased. 

Now, we saw that the square of the multiple correlation 
Tm is obtained when we multiply each of the weights by Tig 
and sum the products. That is to say— 

Tm? = X (weight x saturation) 
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This fraction will be the nearer to unity, the larger $ is ; 
and we can make 5 larger and larger by adding more and 
more (hierarchical) tests to the team. Thus in theory we 
can make a team to give as high a multiple correlation 
with g as we desire. It will also be noticed, however, 
from our table that the tests with high g saturation make 
much the largest contribution to S, and therefore to the 
multiple correlation. 

2. Estimating two factors simultaneously—We have seen 
in the preceding section how to estimate a man’s g from 
his scores in a hierarchical team of tests, and in this we 
shall consider the broader question of estimating factors in 
general. Thus in Chapter V the four tests with corre- 
lations : 


1 2 3 4 
ul 5 “4 4 <2 
2 4 - 7 3 
3 4 E : 3 
4 2 3 3 


were analysed into two common factors and four specifies 
with the loadings (see Chapter V, page 79). 


Common Factors 


T II | Specific Factors 
| 
A ee) E h 
2 | -7746 -3162 | 5477 . 
8 | 7746 -3162 | . F Bar 
4 -8873 á IE: : . +9220 


Any one column of these loadings can be used as the 
criterion row in the calculation by Aitken’s method, and 
the regression coefficients calculated with which to weight 
a man’s test scores in order to estimate that factor for 
him. If, as is probable, we want to estimate both common 
factors, we can do the two calculations together, as shown 
on the next page. Both arrays of loadings are written 
below the matrix of intercorrelations, and then pivotal 
condensation automatically gives both sets of regression 


F.A. —8 
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coefficients, with only one extra row in each slab of the 
calculation : 


(1:0) -4 4 2 {1-0 f 
SATO. al) 38) | —1-0 f 
A R T E] —1-0 
Pome i —1-0 
-5164 -7746 -7746 -3873 | 
A -83162 -3162 | 
(84) -54 22 | 40 —10 
1:00 -6429 -2619| -4762 —1-1905 
-54 84 -22 “40 0 ; 
22 -22 -96 -20 2 Eo 
+5680 -5680 -2840| -5164 
-3162 -3162 
(4928) -0786| 1429 -6429 —1-0 
1-0000 1595| -2900 1:3046 —2-0292 3 7246 
0786 -9024| -0952 -2619 .  —1-0000) +3381 
+2028 -1352| -2459 -6762 | . | 12608 
‘1129—0828 | —-1506 3764 . . | -2560 
(8899) -0724 -1594 1594 —1-0000) -2811 
1:0000 | :O814 -1791 -1791 —1:1237| -3159 
-1029| 1871 "4116 -4116 . {11184 
—1008| —-1883 -2291 -2291 : 1742 
1787 -3982 -3982 = 1156 | 1-0809 


— 1751 2472 "2472 —-1133)| +2060 
Regression Coefficients 


If, therefore, we have a man’s scores (in standard 
measure) in these four tests, our estimate of his Factor I 
will be— 


17872, + -3982z + 39822, + -11562, 


and estimates made in this way will have a multiple 
correlation 7„ with the “ true ” values of the factor, in a 
number of different candidates, given by— 


Tm? = 1787 X +5164 + -3932 x -7746 + 3932 x -7746 
+ +1156 x -3873 = -7462 
<. Tm = +864 ` 
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Similarly, the multiple correlation of the estimate of the 
second factor with the “ true ” values can be found to be— 

A Tm = "395 

The two factors are not, therefore, estimated with equal 
accuracy by the team. As before, the whole calculation 
can be checked by a pooling square. 

We have now found the regression equations for esti- 
mating the two common factors by treating each in turn 
as a “ criterion.” It is also possible to estimate a man’s 
specific factors in the same way. Indeed, we might have 
written the loadings of the specific factors as four more 
rows below the common-factor loadings in the first slab 
and calculated their regression coefficients all in the one 
calculation. But it is easier to obtain the estimate of a 
man’s specific by subtraction (compare Abilities, 1932 
edition, page xviii, line 10). For example, we know that 
the second test score is made up as follows— 


Za = “TT4Gf, + -B162f, + -5477s, 


where fı and f, are the man’s common factors and So his 
specific. We have estimated his f} and f}, and we know 
his z; so we can estimate his s, from this equation. The 
estimates of all a man’s factors, to be consistent with the 
experimental data, must satisfy this equation and similar 
equations for the other tests. If the estimate of the 
specific is actually made by a regression equation, just like 
the other factors, it will be found to satisfy this require- 
ment.* From the estimates of all a man’s factors, there- 
fore, including any specifics, we can reconstruct his scores 
in the tests exactly. From only a few factors, however, 
even from all the common factors, we cannot reproduce 
the scores exactly, but only approximately. 

3. An arithmetical short cut (Ledermann; 1938a, 1939b).— 
If the number of tests is appreciably greater than the 
number of common factors, the following scheme for 


* It is interesting to note that we know the best relative loadings 
of the tests to estimate a specific by regression without needing to 
know how many common factors there are, or whether indeed any 
Specific exists or not. (Wilson, 1934. For the same fact in more 
familiar notation, see Thomson, 19364, 43.) 
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computing the regression coefficients will involve less 
arithmetical labour than the general formule expounded 
in Chapter XIV and applied to the factor problem in this 
chapter.* 

For illustration, we shall use the data of the preceding 
section (page 225), although in that example the number 
of tests (four) exceeds the number of common factors (two) 
only by two, which is too small an amount to demonstrate 
fully the advantages of the present method. The common- 
factor loadings and the specifics of the four tests form a 
4 x 2 matrix and a 4 x 4 matrix respectively, thus : 


M = 


the matrix M, being identical with the first two columns, 
and the matrix M, with the last four columns of the table 
on page 225. Before the data are subjected to the com- 
putational routine process, which will again consist in the 
pivotal condensation of a certain array of numbers, some 
preliminary steps have to be taken: (i) the loadings of 
each test are divided by the square of its specific, and the 
modified values are then listed in a new 4 X 2 matrix: 


| 7042 : 
2:5820 1-0540 | 


Moa =| 9.5820 1-0540 
-4556 : 
e.g. 2-5820 = (-7746) — (-5477)? 


1:0540 = (-8162) + (-5477)? 
(ii) Next, the inner products (see footnote on page 74) of 
every column of M, in turn with every column of Mo, are 
calculated and arranged in a 2 x 2 matrix: 


* This short cut, in the form here given, is only applicable to 
orthogonal factors. For oblique factors, which are described in 
Chapter XII, modifications are necessary in Ledermann’s formulz, 
for which see Thomson (1949) and the later part of Section 19 of the 
Mathematical Appendix, page 365. 
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Deve y |e 1-6329 
MM E zd 

If there had been r common factors the matrix J would 
have been an r X r matrix. The arithmetic is simplified 
by the fact that J is always symmetrical about its diagonal, 
so that only the entries on and above (below) the diagonal 
need be calculated. (iii) Finally, each element on the 
diagonal of J is augmented by unity, giving, in the notation 
of matrix calculus, the matrix : 


+5401 1:6329 | 
*6329 1:6665 | 


This matrix is now “ bordered” below by the matrix 
_M,, and on the right-hand side by a block of minus ones 
and zeros in the usual way. The process of pivotal 
condensation then yields the same regression coefficients 
as were obtained on page 226. 


5:5401 1:6329 —1:0000 6 6:1730 
1:0000 "2947 —'1805 . 1:1142 
1:6329 1-6665 . — 1:0000 2:2994 
"7042 . 7042 
2:5820 1:0540 3:6360 
2:5820 1:0540 3:6360 
"4556 . 4.556 
1:1853 +2947 —1-0000 +4800 

1:0000 "2486 —-8437 +4050 

— +2075 1271 — 0804 

-2931 4.661 "7591 

+2931 4661 “7591 

—:1343 -0822 —-0520 

1787 —:1751 -0036 

Regression Coefficients +3932 124.73 +6404 
"3932 2473 6404. 

1156 —-1133 "0023 


4. Reproducing the original scores—ZLet us imagine a 
man who in each of the four tests in our example obtains 
a score of +1; that is, one standard deviation above the 
average. We choose this set of scores merely to make the 
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arithmetic of the example easy. The regression estimates 
of his two common factors are— 
fi= 1787% + -3982z, + -39822, + -1156z, 
fo = — -17513 + -2472z, + -24722, — -113324 
Inserting his scores z, = 2, =2, = 2, = 1 into these 
equations we get for the regression estimates of his factors— 
fy = 1-0807 
f, = -2060 
that is, we estimate his first factor to be rather more than 
one standard deviation, his second factor to be about 
one-fifth of a standard deviation, above the average. 
Now, the specification equations which give the composi- 
tion of the four tests in terms of the factors are— 


z = -5164f, - + 8563s, 
Za = “TTAG/, + -B162f, + 5477s, 
Z3 = ‘TT46f, + -8162f, + 547783 
z4 = -B873f, - + :9220s, 


If we insert the above estimates f, and f, in lieu of f, and 

fo, we get for this man’s scores— 
zı = -5581 + -85638, 
Za = +9022 + -54778, 
Za = 9022 + -547°78, 
Z4 = 4186 + -92208, 

We know his four scores cach to have been + 1, and 
if we had also worked out the estimates of his specifics 
by the regression method we should have found that they 
added just enough to the above equations to make each 
indeed come to + 1. We can, therefore, find his estimated 
specifics more casily from the above equations, as in this 
case— 


AS T EPA a 
i 8563 2 
QoS = ane 
2 Bee, 


and so for 83; and §,, subtracting the contribution of the 
common factors from the known score (here + 1 in each 
case) and dividing by the specific loading. 
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The regression estimates of the factors, made by the 
system we have so far been considering, are as a matter 
of fact not the only estimates which have been proposed 
(see Section 8 later). The regression estimates are the 
best in the sense that they give the highest correlation, 
taken over a large number of men, between the estimates 
and the true values of a criterion when the latter can be 
separately ascertained. 

The regression estimates of the factors have one other 
great advantage, that they are consistent with the ordinary 
estimation of vocational ability made without using factors 
at all, as can best be shown by means of the example of 
Section 7 of Chapter XIV. 

5. Vocational advice with and without factors—In that 
example we had an “occupation” z» and four tests 
Žo Žž» ž% and z4; and in Chapter XIV, without using factors 
at all, we arrived at the following estimation of a man’s 
success or “ score” in the occupation (which is, after all, 
only a test like the others, though a long-drawn-out one)— 


fo = 890% + -431z, + -222z, + -018z, 


Now let us suppose that the matrix of correlations of 
these five tests (including the occupation as a test) had 
been analysed, by Thurstone’s method or any other, into 
common factors and specifies—the matrix is given in 
Chapter XIV, page 204. Indeed, the four tests proper were 
so analysed by Dr. Alexander in the monograph from which 
we took their correlations, and the analysis below is based 
on his. The “ occupation ” 29 is a pure fiction made for 
the purpose of this illustration, but we can easily imagine it 
also being analysed in exactly the same way as a test. 
The table of loadings of the factors, to which we may as 
well give Dr. Alexander’s names of g (Spearman’s g), v.(a 
verbal factor), and F (a practical factor), is as follows : 


g v F Specific 
Occupation Zo -55 AS -60 -37 
Stanford-Binet Zi -66 -52 "21 -50 
Picture completion A -37 4 “71 -60 
Reading test 2 +52 -66 P -54 
Geometrical analogies z4 ‘Th -67 
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With this table of loadings in our possession we might 
have given vocational advice to a man in a roundabout 
way. Instead of inserting his scores in z, Z» 23, and z; in 
the equation for %, we might have estimated his factors 
g, v, and F from his scores in the four tests, and then 
inserted these estimated factors in the specification equa- 
tion of the occupation— 


zZ = 55g + -45u + -60F + -3759 


(ignoring the specific sg, which cannot be estimated from 
Zo Z» 2g, and z4). Had we done so, we should have arrived 
at exactly the same numerical estimate of his z as by the 
direct method (Thomson, 1936a, 49 and 50). 

The actual estimation of the factors g, v, and F from 
the four tests will form a good arithmetical exercise for the 
student. The beginning and end of the calculation of the 
regression coefficients is shown here, following exactly 
the lines of the smaller example on page 226 of this chapter : 


| Check 

1:00 -39 69 -49 | —1 ; i A 1:57 
39 1:00 19 27 E! é i 1:26 
69 19 1:00 -38 | ; a 3 114 
49 -27 -38 1-00 | A : sA =i “85 
-66 37 52 -4 | 7 5 5 , 2:29 
B2 n 66 <4 i i A 2 118 
Pi gale X : x ; , -92 


This reduces by pivotal condensation step by step to the 
three sets of regression coefficients : 


for ĝ ‘300 095 -095 -532 
for 6 853 — 158-581 — -352 
for Ê 121 -747 — -148 — -206 


The result is to give us three equations for estimating 
g, v, and F from a man’s scores in the four tests, viz.— 


= 8002, + 095z, + -095z, + +5822, 
= *853z, — -158z, + -581z, — -3522, 
= 1212, + 747z, — -148z, — -206z, 


aoe 


Now let us assume a set of scores 2, 2, 23, 2, for a man, 
and see what the estimate of his occupational ability is by 
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the two methods, the one direct without using factors, the 
other by way of factors. Suppose his four scores are— 
A Za 3 a 
2 6 —-4 7 

The estimates of his factors g, v, and F will therefore be— 
300 x -2 + -095 x -6 + -095 x (— -4) + -532 x 7 = "451 
3 X ‘2 — 153 X ‘6 + -581 x (— +4) — -352 x 7 -500 
Î = -121 x -2 + ‘T47 x 6 — 148 x (— -4) — -206 X -7 = -387 

If now we insert these estimates of his factors into 
the specification equation of the occupation, ignoring its 
specific, we get for our estimate of his occupational success : 

& = 55 X 451 + -45 x (— -500) + 60 x -387 = -255 
that is, we estimate that he will be about a quarter of a 
standard deviation better than the average workman. 
This by the indirect method using factors. 

By the direct method, without using factors at all, we 
simply insert his test scores into the equation— 

o = ‘3902, + ‘4312, + "222z, + -018z, 
and obtain— 
& = 890 X -2 + 431 X -6 + -222 X (— -4) + 018 x -7 

= 260 
exactly the same estimate as before—for the difference in 
the third decimal place is entirely due to “ rounding off ” 
during the calculations. The third decimal place of the 
direct calculation is more likely to be correct, since it is 
so much shorter. 

6. Why, then, use factors at all?—The reader may now 
ask, “ What, then, is the use of estimating a man’s factors 
at all?’ Well, in a case analogous to that of the present 
example it is quite unnecessary to use factors at all, and 
there is no doubt that a great many experimenters have 
rushed to factorial analysis with quite unjustifiable hopes 
of somehow getting more out of it than ordinary methods 
of vocational and“ educational advice can give without 
mentioning factors. But we must not go to the other 
extreme and “ throw out the baby with the bath-water.” 
There may be other reasons for using factors, apart from 
vocational advice. And even in giving such advice, which 


¥.A.—8* 
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really means describing men and occupations in similar 
terms, so that we can see if they fit one another or not, it 
may be that factors have some advantages not disclosed 
by the above calculation. 

This man whom we have already used, for example, may 
be described either in terms of his scores in four fairly 
well-known tests, or in terms of the factors g, v, and iha 
By the former method his description is : 


Stanford-Binet test : -2, slightly above average 
Picture-completion test “6, good 
Thorndike reading test —-4, distinctly below average 
Spearman’s geometrical 

analogies . = : "T, good 


This description already suggests to us that he is a man of 
average intelligence or rather better, of not much schooling, 
and with a bit of a gift for seeing shapes, and similarities in 
them. From the correlations of the occupation with these 
four tests we know that it most resembles the first and last 
tests and least resembles the third. We can probably 
draw the conclusion that this man will be above average 
in it; and we can draw this conclusion accurately if we 
calculate the regression equation— 


Žo = 890% + 4812, + -2222, + -01824 


As a description of the man, however, the above table 
suffers from the fact that the four tests are correlated with 
‘one another. We feel a certain clarity in the deseription 
in terms of factors, because these are independent of one 
another and uncorrelated. This man whom we are at 


present considering is alternatively described, in terms of 
factors, as : 


Factor Estimated Amount 
g 451 
v —:500 
F 838M 


that is, a quite intelligent (g) and practical (F) man with, 
however, not much ability in using and understanding 
words (v). There is a certain air of greater generality 
about the factors than there is about the particular tests 
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from which they have been deduced, and they give 
definition and point to mental descriptions, or at least they 
seem to do so. 

Yet some of these “ advantages ” of using factors begin 
to look less bright when looked into more carefully. We 
said that one advantage is that factors are independent 
and uncorrelated. So they are, if their true values are 
known. But we only know their estimates, and these are 
correlated, as we shall illustrate shortly. Ifwe use factors 
it is clear that we must, if we value the advantage of 
independence, seek to obtain estimates which are as little 
correlated with one another as possible. There have been 
proposals to use factors which are really correlated ; not 
merely correlated when their estimates are taken, but 
correlated in their true measures. What advantage can 
these have over the actual correlated tests? The funda- 
mental advantage hoped for by the factorist seems to ‘be 
that the factors (correlated or uncorrelated) may turn out 
to be comparatively few in number, and may thus replace 
a multitude of tests and innumerable occupations by a 
description in these few factors. The student whose 
knowledge of the subject is being obtained from this book 
is not yet equipped to discuss adequately the very funda- 
mental questions raised in this section, to which we shall 
teturn several times in later chapters. One last point in 
favour of factors may, however, be expanded somewhat 
here. We said a couple of sentences back that factorists 
hope to give adequate descriptions of men and of occupa- 
tions in terms of a comparatively small number of factors. 
This, if achieved, would react on social problems somewhat 
in the same way as the introduction of a coinage influences 
trade previously carried on by barter. A man can ex- 
change directly five cows for so many sheep, so much 
cloth, and a new ploughshare; but the transaction is 
facilitated if each of these articles is priced in pounds, 
shillings, and pence, or in dollars and cents, even though 
the end-result is the same. And so perhaps with the 
“ pricing ” of each man and each occupation in terms of a 
few factors. 

But the prices must be accurate; and the analyses of 
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tests and occupations into factors, still more the calculation 
of quantitative estimates of these factors, are as yet very 
inaccurate, and perhaps are inherently subject to uncer- 
tainty. A fluctuating and doubtful coinage can be a 
positive hindrance to trade, and barter may be preferable 
in such circumstances. 

We showed in Section 5 above that a direct regression 
estimate of a man’s ability in an occupation gives identically 
the same result as an estimate via the roundabout path of 
factors, so that at least when the direct regression estimate 
is possible there can be no quantitative advantage in using 
factors. When, however, is the direct regression estimate 
possible, and when is it impossible ? 

To make the direct regression estimate we require the 
complete table of correlations of the tests with one another 
and with the occupation, and we have to know the candidate’s 
scores in the tests. This implies that these same tests have 
been given to a number of workers whose proficiency in the 
occupation is known, for otherwise we would not know the 
correlations of the tests with the occupation. Under these 
ideal circumstances any talk of factors is certainly unneces- 
sary so far as obtaining a quantitative estimate is concerned. 

But suppose these ideal conditions do not hold! These 
tests which we have given to the candidate have never 
been given, at any rate as a battery, to workers in the 
occupation, and their correlations with the occupation are 
unknown! ‘This situation is particularly likely to arise 
in vocational advice or guidance as distinguished from 
vocational selection. In the latter we are, usually on 
behalf of the employer, selecting men for a particular job, 
and we are practically certain to have tried our tests on 
people already in the job, and to be in a position to make 
a direct estimation without factors. But in vocational 
guidance we wish to gauge the young person’s ability in 
very many occupations, and it is unlikely that just this 
battery of tests that we are using has been given to workers 
in all these different jobs. In that case we cannot make 
a direct regression estimate of our candidate’s probable 
proficiency in every occupation. Can we, then, obtain an 
estimate in any other way ? 
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Other ways are conceivable, but it must at the outset 
be emphasized that they are bound to be less accurate than 
the direct estimate without factors. Although this battery 
of tests has not been given to workers in the occupation, 
perhaps other tests have, and by the aid of that other 
battery a factor analysis of the occupation has perhaps 
been made. If our tests enable the same factors to be 
estimated, we can gauge the man’s factors and thence 
indirectly his occupational proficiency. Unfortunately, 
the “if” is a rather big one. Are factors obtained by 
the analysis of different batteries of tests the sam 
may they not be different even th 
name? We shall discuss this very important point later, 
but meanwhile let us suppose that we have reasonable 
confidence in the identity of factors called by the same 
name by different workers with different batteries. Then 
the probable course of events would be something like this. 
An experimenter, using whatever tests he thinks practicable 
and suitable, analyses an occupation into factors. Another 
experimenter, at a different time and place, is asked to 
give advice to a candidate for that occupation. Using 
whatever tests he in his turn has available, he assesses in 
this candidate the factors which the previous experimenter’s 
work leads him to think are necessary in the occupation, 
and gives his advice accordingly. The factors have played 
their part as a go-between, like a coinage. All depends on 
the confidence we have in the identity of the factors. We 
shall see later that there is only too much reason to think 
that the possibility of this confidence being misplaced has 
hardly been sufficiently realized by many over-enthusiastic 
factorists. And even if the common factors are identical, 
there remains the danger that the “ specifie ” of the occu- 
pation may be correlated with some of the “ specifics ” 
of the tests, a fact which cannot be known unless the same 
tests have been given to workers in the occupation. 

7. Calculation of correlation between estimates.—We said 
above that even although we make our analysis of the tests 
We use into uncorrelated factors, the estimates of these 
factors will be correlated, if we use communalities and thus 

ave more factors than tests. Arithmetically, these 


e factors ; 
ough given the same 


238 THE FACTORIAL ANALYSIS OF HUMAN ABILITY 


correlations are easily calculated from the inner products 
of (b), the loadings of the estimated factors with the tests 
(page 232), with (a), the loadings of the tests with the 
factors (page 231). 

The matrix of loadings of the four tests with the three 
common factors is (page 231) : 


-66 -52 “21 


BT 3 FAL 
-52 "66 6 
TA 


| 
| 
M = | 
| 
| 


and the matrix of the loadings of the three estimated 
factors with the four tests is (page 232) : 


-800 -095 095 -532 
N = 353 —-153 581 —-352 
“121 "747 —-148 —-206 
Then the matrix of variances and covariances of the 
estimated factors is— 
K=NM 
Performing the matrix multiplications as explained in 
Chapter X, Section 4, page 145, we obtain : 


-800 095 -095 -582 | | 66 -52 ‘21 
NM = | -353 —'153 -581 rate Ni ge oe 078 
‘121 -747 —-148 —-206 52 +66 
ta | Th 


“676 219 -130 
“218 567 —:034 =K 
127 —:085 -556 


If our arithmetic throughout the whole calculation of 
these loadings had been perfectly accurate, the matrix K 
would have been perfectly symmetrical about its diagonal. 
The actual discrepancies (as -127 and 130) are a measure 
of the degree of arithmetical accuracy attained. 

The matrix K thus arrived at gives by its diagonal 
elements -676, -567, and -556, the variances of the three 
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estimated factors (that is, the squares of their standard 
deviations), and by its other elements their covariances in 
pairs (that is, their overlap with one another). The 
correlation of any two estimated factors is equal to (see 
Chapter I, Figure 2)— 

covariance (ij) 


P= 


V variance (i) x variance (j) 
From K we can therefore form the matrix of correlation 
of the estimated factors. It is: 


=t 
| 


1-000 B53 “212 
"8353 1-000 —-061 
212 —:061 1:000 | 


wherein +353, for example, is -:219 + 4/(:676 x -567). 
Although, therefore, the “ true” factors g and v are un- 
correlated, their estimates ¢ and @ are correlated to an 
amount -353. The “ true ” factors g, v, and F arein standard 
measure, but their estimates g, 3, and F have variances of 
only +676, -567, and -556 instead ofanity. These variances, 
be it noted in passing, are equal also to the squares of the 
correlations between g and §, v and ô, F and Ê. 

Not only are the estimates of the common factors 
correlated among themselves; they are correlated with 
the specifics, so that the estimates of the specifics are not 
strictly specific. As a numerical illustration we may take 
the hierarchical matrix used in Section 1, pages 221 ff. 


ER es 33 Ba 


g {100 -72 +63 54 

Zo TZO -56 48 

23 63 *56 1-00 42 

z, | 54 48 -42 1-00 
The regression estimate of g from this battery is, as we 

found on page 223)— 
& = -5582 + -259z, + -160z, + -10924 

__ The regression estimates for the four specifics can also 
be found, either by a full calculation like that of page 
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226, or by the simpler method of subtraction of page 
227. Thus, to estimate sı in our present example we 
know that— 
z = 9g + V1 — -9 s 
= -9g + -4365, 
Also we know that the estimates g and & will satisfy the 
same equation— 
z = -9f + -4368, 
that is— 
g =A N 
-436 
On inserting the expression for g into this we get— 
é = 1:1527 — -5852, — ‘888z, — -225z, 
and similarly— 


8) = — “7872, + 1-3187, — -215z, — "14574 
& = — "5427 — 2532, + 1:2422, — 1062, 
5,=— 4152, — 194z, — -121z, + 1:1692, 


We have now both N, the matrix of loadings of the 
estimated factors ĝ, s, 8, 83, $, with the four tests, and 
M, which we already know, the matrix of loadings of the 
four tests with the five factors g, Sy, So, S3 and s4, namely : 


9 "436 . 


ol 8 ‘600 
M | 7 s ; 714 : 
oe ae : y -800 


| 


From their product NM we obtain the matrix K of 
variances and covariances of the estimated factors, namely : 


-553 259 -161 109 | 9 +436 
1-152 — +585 — -333 — -225 8B . 600 
— -737 1-818 — -215 — -145 | 7 3 14. 
| 6 800 | 


— 542 — +253 1-242 — -106 
— 415 — 194 — -121 1-169 


‘880 -241 -155 115 -087 | 
‘241 -502 — -321 — -238 — -180 | 
=| -150—-821 a 154-116 | = K 
| 116—236 — -152 -887 — -085 | 
-088 — -181 — -116 — -086 -935 
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Again, we have a check on the accuracy of our arith- 
metic, for K will, if we have been accurate, be exactly 
symmetrical about its principal diagonal, i.e. its diagonal 
running from north-west to south-east. The largest dis- 
crepancy in our case is between -150 and -155. Moreover, 
since in this‘case K includes all the factors, we have another 
check which was not available when we calculated a K for 
common factors only: the sum of the elements in the 
principal diagonal (called the “ trace,” or in German the 


“Spur ”) here must come out equal to the number of tests. 
In our case we haye— 


880 + -502 + -788 + -887 + 935 = 3-992 

and there are four tests. These elements which form the 
trace of K are, it will be remembered, the variances of the 
estimates £, 8, & 8, and s. So that we see that the total 
variances of the five factors is no greater than the total 
variance (viz. 4) of the four tests in standard measure. 
This is only another instance of the general law that we 
cannot get more out of anything than we put into it (at 
any rate, not in the long run). 

From K we can at once calculate the correlation of the 
estimated factors. Adjusting the slight arithmetical de- 
partures from symmetry, we get : 


3 
> 


| é si 2 8; 8, 
ê | 1-000 B62 184 131 096 
ŝi ‘362 1:000 — -510 — -354 — -263 
& | 'I84 — -510 1-000 — -183 — -135 
8 | -181 — -354 — -188 1-000 — -094 
§, | 096 — -263 — -135 —-094 1-000 


from which we see that g is correlated with e 
estimated specifics positively, 
Negatively among themselv: 
example. 

We have then this result, that although we set out to 
analyse our battery of tests into independent uncorrelated 
actors, the estimates which we make of these factors are 
Correlated with one another, and instead of being in 


ach of the 
while the latter are correlated 


es, in this (a hierarchical) 
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standard measure have variances, and therefore standard 
deviations, less than unity. We could, of course, make 
them unity by dividing all our estimates by their calculated 
standard deviation. But that would make no change in 
their correlations. 

The cause of all this is the excess of factors over tests, 
and consequently this drawback—the correlation of the 
estimates—depends upon the ratio of the number of factors 
to the number of tests. The extra factors are the common 
factors, for there is a specific to each test, and therefore 
with the same number of common factors the correlation 
between the estimates will decrease as the number of tests 
in the battery increases. Just as in the hierarchical case 
one of the tasks of the experimenter is to find tests to add 
to the number in his battery without destroying its hier- 
archical nature, so in the case of a battery which can be 
reduced to rank 2, 3, 4... or 7, a task will be to add 

' tests to the battery which with suitable communalities will 
leave the rank unchanged and the pre-existing com- 
munalities unaltered, in order that the common factors 
may be the more accurately estimated, and the estimates 
be more nearly uncorrelated. 

8. Bartlett’s method of estimation—M. S. Bartlett (1935, 
19374, 1938) has proposed to estimate the common factors, 
not by the ordinary regression method used above, but by 
a method which minimizes the sum of the squares of a 
man’s specific factors (already, however, maximized by 
the principle of using as few common factors as possible). 

The way in which Bartlett’s estimates differ from 
regression estimates of factors can be very clearly seen by 
thinking in terms of the geometrical picture already used 
in earlier chapters. When the factors outnumber the tests, 
the vectors representing the former are in a space of higher 
dimensions than the test space. 

The individual person is represented in the test space 
by a point, namely that point P whose projections on to 
the test vectors give his test scores. We do not know a 
representative point for this individual in the complete 
factor space, however. His representative point Q may 
be, for all we know, anywhere in the subspace which is 


THE ESTIMATION OF A MAN’S FACTORS 243 


perpendicular to the test space and intersects with it at 
P. In these circumstances the regression method takes 
refuge in the assumption that this individual is average 
in all qualities of which we know nothing; that is, in 
all qualities orthogonal to our test space. It therefore 
assumes P to be his point also in the factor space, and 
projects P on to the factor axes to get the estimates of his 
factors. 

Bartlett’s method is equivalent to a different assumption 
about the position of the point Q. Within the complete 
factor space there is a subspace which contains the common 
factors. Of all the positions open to the point Q, Bartlett’s 
method chooses that one which is nearest to the common- 
factor space, and from thence projects on to the common- 
factor vectors. This is equivalent to making the assump- 
tion that this man is notaverage in the qualities about which 
we know nothing, but instead possesses in those unknown 
qualities just those degrees of excellence which bring his 
representative point to the chosen point Q. Because men 
are most frequently near the average, the regression assump- 
tion is more likely. 

9. The geometrical interpretation of Bartlett's method. 
All this can be most clearly seen (because a perspective 
diagram can be made) in the case of estimating one genera 
factor g only, the hierarchical case. A figure like Figure 30 
will illustrate this case, if we take y and z there to be two 
tests and æ to be the g vector (see page 214). 

The man’s representative point in the yz plane is P. 
But we do not know his representative point Q in solid 
three-dimensional space, only that it is somewhere on the 
line P'PP”. The regression method assumes that it is 
actually at P, the-average, and projects P itself on to the g 
line to get the estimate OL of g. Bartlett’s method, on the 
other hand, assumes that Q is at that point on P’PP” where 
it most nearly approaches the g line, that is, somewhere 
Near the position Q in our diagram. Bartlett’s estimate of 
& is then represented by OL’. 

Now, any point on the line P’PP", when projected on to 
the test vectors y and z, gives the same two test scores. 
There is, in general, no point on the line g which does this 
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exactly. But clearly L’, of all the points on g, will be the 
point whose projections most nearly fall on Y and Z, for 
X’ is as near as possible to the line P'PP”. That is, the 
projection of X’ on to the plane of the tests falls as near 
to the point P as is possible. In other words, if we ignore 
the specifics entirely and use only the estimated g in the 
specification of y and z, Bartlett’s estimate comes as near 
as is possible to giving us back the full scores OM and ON. 
If the regression estimate OL is projected on to the lines 
y and 2, it will obviously give a worse approximation. 

The regression method, in order to recover as much as 
possible of the original scores, would have to make a 
second estimate of them. For the estimates of g repre- 
sented by quantities like OL are not in standard measure. 
Before projecting the point L on to the lines y and z, 
therefore, to recover the original scores as far as possible, 
the regression method would alter the scale of its space 
along the g vector until the quantities like OL- were in 
standard measure. This would not only change the posi- 
tion of L on the line, it would change the angles which 
the lines in the figure make with one another ; and would 
change them exactly in such a manner that, in the new space, 
the projection of OL on to y and z would fall exactly where 
the Bartlett projections from L’ fall in the present space 
(Thomson, 1938a). 

There is, therefore, no final difference in excellence 
between the two methods in the matter of restoring the 
original scores as fully as possible, but the regression 
method takes two bites at the cherry. On the other hand, 
the regression estimates can be put straight into the speci- 
fication equation of an occupation which is known to 
require just these common factors, whereas here it is the 
Bartlett method which has to have a second shot. 

Both methods have to change their estimate of g when 
a new test is added to the battery. For the man is not 
very likely to have, in the specific of this new test, either 
the average value previously assumed by the regression 
method, or the special value assumed by the Bartlett 
method. But he is more likely to have the former than 
the latter, so the Bartlett estimates will change more 
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than do the regression estimates as the battery grows. 
Ultimately, when the number of tests becomes infinite, the 
two forms of estimate will agree. 

In the case of estimates of one general factor g from a 
hierarchical battery, the Bartlett estimates differ from the 
regression estimates only in scale. They put the candidates 
in the same order of merit for g as do the regression esti- 
mates, but give them a greater scatter, making the high 
gs higher and the low g’s lower. The formula is— 


instead of Spearman’s— 
ie Losi 
Leos aE 

With more than one common factor, the connexion 
between the two kinds of estimate is not so simple (Appen- 
dix, Section 13). The mathematical reader will be able to 
calculate the Bartlett factor estimates from the matrix 
formule given in the Appendix. r 

10. Estimation of oblique factors—In applying the 
method of Section 2 to oblique factors, it is important to 
note that we must use, below the matrix of correlations of 
the tests, in a calculation like that on page 226, the matrix 
of correlations of the primary factors with the tests. 
These are the elements of the structure on the primary 
factors, F(A')-} D, transposed so that columns become rows 
and vice versa. It would not do to use the structure on the 
reference vectors, which is all that most experimenters 
content themselves with calculating. 

Ledermann’s short cut (Section 3 above) requires con- 
Siderable modification in the case of oblique factors. See 
Thomson (1949) and the later part of Section 19 of the 
Mathematical Appendix, page 365. 


see page 224). 


PART V 
CORRELATIONS BETWEEN PERSONS 


CHAPTER XVI 
REVERSING THE ROLES* 


1. Eachanging the rôles of persons and tests—In all the 
previous chapters the correlations considered have been 
correlations between tests, and the experiments envisaged 
were experiments in which comparatively few tests were 
administered to a large number of persons. For each test 
there would, therefore, be a long list of marks. The whole 
set of marks would make an oblong matrix, with a few 
rows for the tests, and a very large number of columns for 
the persons—we will choose that way of writing it, of the 
two possibilities. 

From such a set of marks we then calculated the 
Correlation coefficients for each pair of tests, and our 
analysis of the tests into factors was based upon these. 
In the process of calculating a correlation coefficient we do 
Such things to the row of marks in each test as finding its 
average, and finding its standard deviation. We quite 
naturally assume that we can legitimately carry out these 
Operations. We assume, that is, that in the row of marks 
for one test these marks are comparable magnitudes which 
4 RY rate rise and fall with some mental quality even 
f hey do not strictly speaking measure it in units, like 
eet or ounces, 

b he question we are going to ask in this part of this 
ook is whether, in the above procedure, the rôles of persons 

and of tests can be exchanged (Thomson, 1935), 75, 
Equation 17), and if so what light this throws upon 
actoria] analysis. Instead of comparatively few tests 
Gone pet explicit reference to correlations between persons in 
pen dently Bas Pe es Da a seem w have REE Lec 
tephenson abe MEAS e eee py " omson ( s A j, put ME si 

optimistic, ee. ugus ), the oae eing pensin ic, the la 3 
such correlations had actually been used much 


earli 
See by Burt and by Thomson, and almost certainly by others. 
wt and Davies, Journ. Exper. Pedag., 1912, 1, 251. 
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(perhaps two or three dozen) and a very large number of 
persons, suppose we have comparatively few persons, and 
a large number of tests, and find the correlations between 
the persons. In that case our matrix of marks would be 
oblong in the other direction, with a large number of 
rows for the tests, and a small number of columns for 
the persons, and each correlation, instead of being as 
before between two rows, would be between two columns. 
Taking only small numbers for purposes of an explanatory 
table, we would have in the ordinary kind of correlations 
a table of marks like this : 


Persons 
x x X x X x X 
MESTA x x X x x x 
x x x x x x x 


while for correlations between persons we would have a 
table of marks like this : 


Persons 
x x x 
x x x 
x x x 
Tests x x x 
X x x 
X X x 
x x X 


But we meet at once with a serious difficulty as soon as 
we attempt to calculate a correlation coefficient between 
two persons from the second kind of matrix. To do so, 
we must find the average of each column, just as previously 
we found the average of each row for the other kind of 
correlation. But to find the average of each column (by 
adding all the marks in that column together and dividing 
by their number) is to assume that these marks are in 
some sense commensurable up and down the column, 
although each entry is a mark for a different test, on a 
scoring system which is wholly arbitrary in each test 
(Thomson, 1935b, 75-6). 
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To make this difficulty more obvious, let us suppose 
that the first four tests are : 

1. A form-board test ; 

2. A dotting test ; 

3. An absurdities test ; 

4. An analogies test. 

In each of these the experimenter has devised some 
kind of scoring system. Perhaps in the form-board test 
he gives a maximum of 20 points, and in the dotting test 
the score may be the number of dots made in half a minute. 
But to find the average of such different things as this is 
palpably absurd, and the whole operation can be entirely 
altered by an arbitrary change like taking the number of 
seconds to solve the form board instead of giving points. 

2. Ranking pictures, essays, or moods.—This is a very 
fundamental difficulty which will probably make correla- 
tions between persons in the general case impossible to 
calculate. In certain situations, however, it does not arise, 
namely where each person can put the “ tests” in an 
order of preference according to some criterion or judg- 
ment (Stephenson, 1935), and it is with cases of this kind 
that we shall deal in the first place. Usually the “ tests 
here are not really different tests like those named above, 
but are perhaps a number of children’s essays which have 
to be placed in order of merit, or a number of pictures in 
order of æsthetic preference, or a number of moods which 
the Subject has to number, indicating the frequency of 
their occurrence in himself. Indeed, the subject might not 
only give an order of preference to, say, the essays, but 
Might give them actual marks, and there would be no 
absurdity in averaging the column of such marks, or in 
Correlating two such columns, made by different persons. 

Such a correlation coefficient would show the degree of 
resemblance between the two lists of marks given to the 
children, or given to a set of pictures according to their 
xsthetic value. It would indicate, therefore, a resemblance 

etween the minds of the two persons who marked the 
essays or judged the pictures. A matrix of correlations 
etween several such persons might look exactly like the 
matrices of correlations between tests, and could be 
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analysed in any of the same ways. What would the 
“ factors ” which resulted from such an analysis mean when 
the correlations were between persons ? Take an imagin- 
ary hierarchical case first. 

3. The two sets of equations.—In test analysis the common 
factor found was taken to be something called into play 
by each test, the different tests being differently loaded 
with it. The test was represented by an equation such 

as— 
Z4 = 6g + “854 

For each of the numerous persons who formed the sub- 
jects of the testing, an estimate was made of his g, and 
another estimate could be made of kis s,. The different 
tests were combined into a weighted battery for this 
purpose of estimating a man’s amount of g. His score in 
Test 4 would then be made up of his g and s, inserted in 
the above specification equation. 


Zir = 6go F 854.9 


would be the score of the ninth person in Test 4. 

By analogy, when we analyse a matrix consisting of 
correlations between persons, we arrive at a set of equations 
describing the persons in terms of common and specific 
factors. Corresponding to a hierarchical battery of tests, 
we could conceivably have a hierarchical team of persons, 
from which we would exclude any person too similar to 
one already included. Each person in the hierarchical 
team would then be made up of a factor he shared with 
everyone else in the team, and a specific factor which was 
his own idiosynerasy. An equation like— 


By = -4g' + ‘917s, 


would now specify the composition of the ninth person. 
g’ is something all the persons have, sọ is peculiar to 
Person 9. The loadings now describe the person, and the 
amount of g’ “ possessed ” or demanded by each test can 
be estimated by exactly the same techniques employed in 
Chapter XV. The score which Test 4 would elicit from 
Person 9 would be obtained by inserting the g’ and sy 
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“possessed ” by that test into the specification equation 
of Person 9, giving— 
Zg = Agy + 9175-4 
This equation is to be compared with the former equation— 
Zag = Oy + “884-9 

Both equations ultimately describe the same score, but 
%q-4 is not identical with 2.5. The raw score X is the same, 
but the one standardized z is measured from a different 
zero, and in different units, from the other. Disregarding 
this for the moment, we see that with the exchange of 
rôles of tests and persons, the loadings and the factors have 
also changed rôles. Formerly, persons possessed different 
amounts of g, and tests were differently loaded with it. 
Now, tests possess different amounts of g’, and persons are 
differently loaded with it. We feel impelled to inquire 
further into the relationships of these complementary 
factors and loadings. 

The test which is most highly saturated with g is that 
One which, in terms of Spearman’s imagery, requires most 
expenditure of general mental energy, and is least depen- 
dent upon specific neural engines. It correlates more 
with its fellow-members of the hierarchical battery than 
any other test among them does. It represents best what 
1S Common to them all. 

_The man, in a hierarchical team of men, who is most 
highly saturated with g’ is that man who is most like all 
the others. His correlations with them are higher than is 
the case for any other man in the team. He is the indi- 
vidual who best represents the type. But a nearer ap- 
Proach to the type can be made by a weighted team of men, 
diet ae formerly we weighted a battery of tests to estimate 
their common factor. 

4. Weighting examiners like a Spearman battery.—Corre- 
ae of this kind between persons were used long before 
a a what Stephenson has called “ inverted factorial 
a ene ~ Was present. The author and a colleague found 
a © winter of 1924-5 a number of correlations between 
fi ye ‘rienced teachers who marked the essays written by 

Y schoolboys upon “ Ships” (Thomson and Bailes, 
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1926). One table or matrix of such correlations between 
the class teacher and six experienced head masters who 
marked the essays independently of one another, was as 
follows : 

Te A B C D E F 


-60 -69 “56 -69 “63 67 


Te - 

A -60 . "53 -50 54 -55 “68 
B -69 -53 `: -60 -65 “66 OA 
(6 “56 “50 -60 . 67 67 -65 
D -69 54 -65 “67 5 54 -69 
E 63 55 -66 67 54 5 -69 
F “67 68 “64 “65 -69 -69 


In the article in question, these different markers were 
compared by correlating each with the pool of all the rest. 
These correlations are shown in the first row of the table 
below. 

Purely as an illustrative example, let us make also an 
approximate analysis of this matrix, and take out at any 
rate its chief common factor. On the assumption that it 
is roughly hierarchical, we can use Spearman’s formula— 


S r| 
Saturation ean 
T — 2A 


More easily we can insert its largest correlation coefficient 
as an approximate communality for each test, and find 
Thurstone’s approximate first-factor loadings (see Chapter 
V, page 70). We get for the saturations or loadings the 
second and third rows of this table : 


Te A B ©, D Bae 


Correlation with pool of rest | -77 -67 -76 73 76 75 82 
Spearman saturations "814 -704 +796 +766 -798 788 -861 
Thurstone method | ‘81 73 -80 -78 -80 -80 -85 


We see that F is the most “ typical ” examiner of these 
essays, in the sense that he is more highly saturated with 
what is common to all of them; while A conforms least 
to the herd. : 

With the same formula which on page 224 we used to estl- 
mate a man’s g from his test scores, we could here estimate 


* See Chapter III, page 43. 
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an essay’s g’ from its examiner scores. That is to say, the 
marks given by the different examiners would be weighted 
m proportion to the quantities— 


Saturation with g’ 


1 — saturation? 


where g’ is that quality of an essay which makes a common 
appeal to all these examiners. Their marks (after being 
standardized) would therefore be weighted in the propor- 
tions ‘814/(1 — -8142), ete., that is: 


Te A B CY SDE Er 
241 1-40 217 1-85 220 208 333 
oe ‘72 42 65 -56 -66 -63 1-00 


to make global marks for the essays, which could then be 
reduced to any convenient scale. If this were done, the 
result would be the “ best ” estimate* of that aspect or 
Set of aspects of the essay which all these examiners are 
taking into account, disregarding all that can possibly 
© regarded as idiosyncrasies of individual examiners. 

hether we think it the best estimate in other senses is a 
matter of subjective opinion. We may wish the “ idiosyn- 
crasies ” (the specific, that is) of a certain examiner to be 
Sven great weight. It clearly would not do, for example, 
to exclude Examiner A from the above team merely because 
e is the most different from the common opinion of the 
team, without some further knowledge of the men and the 
Purpose of the examination. The “ different ” member in 
S i eam might, for example, be the only artist on a com- 
mittee judging pictures, or the only Democrat in a court 
Judging legal issues, or the only woman on a jury trying 
an accused girl. But in non-controversial matters, if all 
are of about equal experience, it is probable that this 
System of weighting, restricting itself to what is certainly 


eon to all, will be most generally acceptable as 
rest, 


* 
or Best whether we adopt the regression principle or Bartlett’s. 
if only one “ common factor” is estimated, the difference is 


On . 
+ of unit only, and the weighting in the text is the “ best ” on 
Oth systems, 


256 THE FACTORIAL ANALYSIS OF HUMAN ABILITY 


5. Example from “The Marks of Examiners.” —This 
form of weighting examiners’ marks has probably never 
yet been used in practice. But it has been employed, by 
Cyril Burt, in an inquiry into the marks given by examiners 
(Burt, 1986). As an example, we take the marks given 
independently by six examiners to the answer papers of 
fifteen candidates aged about 16, in an examination in 
Latin. (The example is somewhat unusual, inasmuch as 
these candidates were a specially selected lot who had all 
been adjudged equal by a previous examiner, but it will 
serve as an illustration if the reader will disregard that 
fact.) The marks were (op. cit., 20) : 

Cand.| A B Cc D E F Examiners 


39 43 52 37 43 40 
39 dt 50 43 43 46 
44 51 55 AT 46 46 
37 46 43 44 40 43 
38 AT 55 35 43 45 
45 50 54 45 45 49 
42 52 51 45 AA. 46 
43 49 53 AT 46 46 
382 42 49 34 36 38 
10 37 40 48 37 39 42 
11 38 42 AT 39 36 39 
12 40 44, 50 41 36 42 
13 38 43 50 36 34 4l 
14 35 45 49 37 40 40 
15 32 38 41 28 34 34 


SmNRankoane 


The correlations between the examiners calculated from 
this table are (the examiner with the highest total correla- 
tion leading) : 


F A B E D Cc 


| 
705 | a 86 +84. +82 84 "71 
A | -86 l 30: 7A 85 yi 
B| 84 -80 z -80 -81 -67 
E | +82 TA -80 : 72 “69 
D | +84 85 “81 “72 5 -48 
C| nie EAr e 69. 48 ; 


If, assuming this table to be hierarchical, we find each 
examiner’s saturation with the common factor by Spear- 
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man’s formula, we obtain (with Professor Burt, op. cit., 
294) : 

F A B E D Cc 
95 -92 91 -87 84 -72 


In the sense, therefore, of being most typical, F is here 
the best examiner. The proportionate weights to be given 
to each examiner, in making up that global mark for the 
candidate which will best agree with the common factor of 
the team of examiners, are, as before— 


Saturation 
1 — saturation? 


provided the marks have first been standardized. The 
resulting weights, giving F the weight unity, are: - 


F A B E D (6 
1-00 ‘61 +54 37 -29 15 


(If the weights are to be applied to the raw or unstan- 
dardized marks, they must each be: divided by that 
examiner’s standard deviation.) 
a The marks thus obtained are only an estimate of the 

true” common-factor mark for each child, just as was 
the case in estimating Spearman’s g; and the correlation 
of these estimates with the “ true ” (but otherwise undis- 
foverable) mark will be, as there (Chapter XV, page 224)— 


a= dJ 
te = VES 


Where § is the sum of all the six quantities— 


Saturation? 
1 — saturation? 


In our case this gives— 
Ta E98 
ene best examiner’s marking itself correlated with the 
2 othetical “ true ” mark to the amount -95, so that 
a ™mprovement is not worth the trouble of weighting; 
Specially as the simple average of the team of examiners 


i Š Ta 
Sives -97. But in some circumstances the additional 
F.A.—9 
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labour might be worth while, and there is an interest in . 
knowing which examiners conform least and which most 
to the team, and having a measure of this. 

After the saturation of each examiner with the hypothet- 
ical common factor has been found, the correlations due 
to that factor can be removed from the table exactly as 
in analysing tests. The residues, as there, may show 
the presence of other factors; and “ specific” resem- 
blances or antagonisms between pairs of examiners, or 
minor factors running through groups of examiners, may 
be detected and estimated. 

In short, all the methods used on correlations between 
tests may be employed on correlations between examiners. 
The tests have come alive and are called examiners, that 
is all. But since the child’s performance, judged by 
the different examiners differently, is here nevertheless 
the same identical performance, our interpretation of the 
results is different. The two cases throw light on one 
another. A Spearman hierarchical battery of tests may 
estimate each child’s general intelligence, which is there 
something in common among the tests. The examiners 
may have been instructed to mark exclusively for what 
they think is general intelligence. In that case their 
weighted team will estimate for each child a general 
intelligence, which is something in common among the 
somewhat discrepant ideas the examiners hold on this 
matter. 

6. Preferences for school subjects—In the previous sec- 
tions we have discussed correlations between examiners 
who all mark the same examination papers. The purpose 
of their marking these papers is to award prizes, distinc- 
tions; passes, and failures to the candidates. The exam- 
iners are a means to this end; the reason for employing 
several of them is to obtain a list of successes and failures 
in which we can have greater confidence. The technique 
described is one which enables us to combine their marks, 
on certain assumptions, to greatest advantage. But it 
can, as in the inquiries described in The Marks of Examiners, 
be turned to compare individual examiners, and to evaluate 
the whole process of examining. 


REVERSING THE ROLES 259 


It is only a step to another, very similar, experiment in 
which objects evaluated by the “ examiners” are not the 
works of candidates in an examination, but are objects 
chosen for the express purpose of gaining an insight into 
the minds of those asked to judge them. Thus we might 
ask several persons each to evaluate on some scale the 
æsthetic appeal of forty or fifty works of art (Stephenson, 
1936b, 353), or ask a number of school pupils each to place 
in order of interest a list of school subjects. 

Stephenson (19362) asked forty boys and forty girls 
attending a higher school in Surrey, England, thus to 
place in order of their preference twelve school subjects 
represented by sixty examination papers, and calculated 
for about half these pupils the correlation coefficients 
between them. To explain the kind of outcome that may 
be expected from such an experiment it will be sufficient 
for us to quote his data for a smaller number of pupils, 
Say eight girls, avoiding anomalous cases for simplicity in 
a first consideration. The correlations between them were 
as follows (op. cit., 50) : 


Girl | 3 j KD 7 17 18 19 20 
—_| E 
3 : 59 81 26 —02 —-16 —-88 —-85 
all ee: 59 75 42 —23 —-01 —66 —-03 
5| 81 75 65 2p —02 —-18 —-08 
| Pitas ees ery =150 . 215 = 540 17 
17 | —02 —.28 —.29 —-50 : 60 52 72 
18 | —16 —01 —02 —15 -60 £ 09 79 
19 | —38 —.66 —18 E A -09 $ “40 
20 | —85 —08 —-08) SA7 “72 “79 "40 . 


This table at once suggests that these girls fall into two 
types. Girls 8, 4, 5, and 7 correlate positively among 
themselves ; they have somewhat similar preferences 
among school subjects. Girls 17, 18, 19, and 20 correlate 
Positively among themselves. But the two groups correlate 
negatively with one another. The two types were different 
F their order of preference, Type I tending, for example, 
Gh put English and French higher, and Physics and 

cmistry lower, than Type II (though both were agreed 
that Latin was about the least lovable of their studies!). 
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7. A parallel with a previous experiment.—This experi- 
ment, it will be seen, forms a parallel to that inquiry (also 
by Stephenson) described in Chapter I, Section 9, where 
tests fell into two types, verbal and pictorial, with correla- 
tions falling there as here into four quadrants. If we call 
the two types of school pupil here the linguistic (Z) and 
the scientific (S), and again use C for the eross-correlations, 
the diagram corresponding to that on page 16 of Chapter I 
is : 


The chief difference between the two cases is that there 
the cross-correlations, though smaller than hierarchical 
order in the whole table would demand, were nevertheless 
positive. Here, however, the cross-correlations are 
actually negative. i 

It is true that the signs of all the correlations in the C 
quadrants can in either case be reversed, by reversing the 
order of the lists either of all the earlier or all the later 
variables (there tests, here pupils). But that is not really 
permissible in either case. We have no doubt which is 
the top and which the bottom end of a list of marks, 
whether in a verbal test or a pictorial test ; and to reverse 
the order of preference given by either the linguistic or the 
scientific pupils would be simply to stultify the inquiry. 
There is, therefore, a real difference between the cases. 
In the present set of correlations something is acting as an 
“interference factor.” 

In Chapter I we explained the correlations and their 
tetrad-differences by the hypothesis of three uncorrelated 
factors g, v, and p required in various proportions by the 
tests, and possessed in various amounts by the children. 
The loadings which indicated the proportions of the factors 
in each test we tacitly assumed to be all positive. Thur- 
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Stone expressly says that it is contrary to psychological 
expectation to have more than occasional negative loadings. 
8. Negative loadings—Let us endeavour to make at least 

a qualitative scheme of factors to express the correlations 
between the pupils, factors possessed in various amounts 
by the subjects of the school curriculum, and demanded 
in various proportions by each pupil before he will call 
the subject interesting. One type of pupil weights heavily 
the linguistic factor in a subject in evaluating its interest 
to him. The other type weights heavily the scientific 
factor in a subject in judging its attraction for him. But 
to explain actual negative correlations between pupils we 
must assume that some of the loadings are negative, 
assume, that is, that some of the children are actively 
repelled by factors which attract others. Common sense 
does not think thus. Common sense says that two children 
May put the subjects in opposite orders, even though they 
both like them all, provided they don’t like them equally 
well. But then common sense is not anxious to analyse 
the children into uncorrelated additive factors. If each 
child is thus expressed as the weighted sum of various 
factors, two children can correlate negatively only if some 
of the loadings are negative in the one child and positive 
in the other, for the correlation is the inner product of the 
loadings, Since Stephenson has found numerous nega- 
tive correlations between persons, and since few negative 
Correlations are reported between tests, we seem here to 
ave an experimental difference between the two kinds of 
Correlation, and if ever correlations between persons come 
to be analysed as minutely and painstakingly as correla- 
tions between tests, it would seem that the free admission 
of negative loadings would be necessary.* The present 
Matrix can in fact be roughly analysed into two general 
factors, one of which has positive loadings in all pupils, 
while the other is positively loaded in the one type, 
negatively loaded in the other. A 
9. An analysis of moods.—A still more ingenious appli- 
cation by Stephenson of correlations between persons 1s in 
an experiment in which for each person a “ population i 

* See Stephenson, 1936b, 349. ` 
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of thirty moods, such as “ irascible,” “ cheerful,” “ sunny,” 
were rated for their prevalence and intensity for each of 
ten patients in a mental hospital, and for six normal 
persons (Stephenson, 1936c, 363). This time the correla- 
tion table indicated three types, corresponding to the 
manic-depressives, the schizophrenes, and the normal 
persons, each type correlating positively within itself, but 
negatively or very little with the other types. These 
experiments were only illustrative, and it remains to be 
seen whether factors which will prove acceptable psycho- 
logically will be isolated in persons in the same manner as g, 
and the verbal factor, have been isolated in tests. The 
parallel between the two kinds of correlation and analysis 
is, however, certainly likely to throw light on the nature of 
factors of both kinds. 


CHAPTER XVII 


THE RELATION BETWEEN TEST FACTORS 
AND PERSON FACTORS 


1. Burt’s example, centred both by rows and by columns.—In 
the examples we have just considered, there is no doubt 
that correlations between persons can be calculated without 
absurdity. In the matrix of marks given by a number of ex- 
aminers (marking the same paper) toa number of candidates, 
either two candidates can be correlated or two examiners. 
The heterogeneity of marks referred to in Chapter XVI, 
Section 1, does not enter as a difficulty. Still keeping to 
such material, let us ask ourselves what the relation is 
between factors found in the one way, and factors found in 
the other. Qualitatively, we have already suggested that 
factors and loadings change rôles in some manner. The 
most determined attempt to find an exact relationship has 
been that made by Cyril Burt, who concludes that, if the 
initial units have been suitably chosen, the factors of the 
one kind of analysis are identical with the loadings of the 
other, and vice versa (Burt, 1937b). The present writer, 
while agreeing that this is so in the very special circum- 
stances assumed by Burt, is of opinion that his is a very 
narrow case, and that the factors considered by Burt are 
not typical of those in actual use in experimental psycho- 
logy. Theoretically, however, Burt’s paper is of very great 
Interest. It can be presented to the general reader best 
by using Burt’s own small numerical example, based on a 
matrix of marks for four persons in three tests : 


` Persons a b c d 
1 —6 2 0 4 

Tests 2 3 1 —1 —8 
3 8 — 3 1-1 


It will be noticed that this matrix of marks is already 
centred both ways. The rows add up to zero, and so do 
263 
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the columns. The test scores have been measured from 
their means, and then thereafter the columns of personal 
scores have been measured from their means; or it can 
be done persons first, tests second, the end-result being 
the same. Burt does not give the matrix of raw scores 
from which the above matrix comes. 

If we take the doubly centred matrix as he gives it, the 
matrices of variances and covariances formed from it are: 


Test Covariances 


1 2 3 
i 56 — 28 — 28 
2 |— 28 20 8 
8 |— 28 8 20 


Person Covariances 


| a b c d 
| 


a 54 —18 0 —36 
bi—18 Ww +4 8 
c Ou 4 2 2 
d s 8 2 26 


Notice that in both these matrices the columns add to 
zero, just as they do in the matrices of residues in the 
“ centroid ” process. 

2. Analysis of the covariances.—Burt next proceeds to 
analyse each of these by Hotelling’s method. It seems 
clear that there will exist some relation between the two 
analyses, since the primary origin of each matrix is the 
same table of raw marks, and to show that relation most 
clearly Burt analyses the covariances direct, and not the 
correlations which could be made from each table (by 
dividing each covariance by the square root of the product 
of the two variances concerned). For the two Hotelling 
analyses he obtains (and the centroid factors before 
rotation would here be the same) : 
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Analysis of the Tests 
a= 2V14y, 
eS —Vi4y, alg Vör 
8, i Viy a Vbr2 


Analysis of the Persons 
a=—38Vv6 fa 
b= V6f,+2V2f, 
(E — V2 fr 
d= RAOR EVIS 
In both cases two factors are sufficient (there will always 
be fewer Hotelling or centroid factors than tests with 
a doubly centred matrix of marks, for a mathematical 
reason). The reader can check that the inner products 
give the covariances, e.g.— 
Covariance (bd) = y6 X 24/6 — 24/2 X y2 =12— 4 =o) 
y The method of finding Hotelling loadings was described 
m Chapter VII, and the reader can readily check that the 
coefficients of Yı for example, do act as required by that 
method. For if we use numbers proportional to 24/14, 
— V14, and — 4/14, namely 1, — $, — $, as Hotelling 
multipliers we get: : 
56 — 28 — 28 1 
— 28 20 8 |—4 
— 28 8 20 |—#4 


56 — 28 — 28 
14 —10 —4 
14 —4 —10 


84 — 42 — 42 


Proportional to 1 —% —#as required. 

The largest total (84) is the first “ latent root,” and the 
multipliers 1, — 4, — 4, have to be divided, according to 
Chapter VII, by the square root of the sum of their squares, 
and multiplied by the square root of 84, giving— 

24/14 yla —4/14 


F.A, —9* 
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3. Factors possessed by each person and by each test.— 
Burt then goes on to “ estimate,” by “ regression equa- 
tions,” the amount of the factors y possessed by the 
persons, and the amount of the factors f possessed by the 
tests. There is a misuse of terms here, for with these 
factors there is no need to “estimate”; they can be 
accurately calculated : but that is a small point. The first 
three equations can be solved for the y’s—there is indeed 
one equation too many, but it is consistent. And the four 
equations of the second group can be solved for the f’s— 
again they are consistent. Since the equations are con- 
sistent, we can choose the easiest pair in each case to solve 
for the two unknowns. Choosing the two equations for 
@, and g, we obtain— 


1 
“=z 
y, _ t + $y 
EAA 


For the other set of factors we naturally choose the 
equations in 4 and c, and have— 


a 
nE 5776 
c 
AA 


Now, since we are very liable to confusion in this dis- 
cussion, let us remind ourselves what these factors y and 
these factors f are. The factors y are factors into which 
each test has been analysed. They do not vary in amount 
from test to test, but each test is differently loaded with 
them. They vary in amount from person to person. 

The factors f are factors into which each person has been 
analysed. These do not vary in amount from person to 
person, but from test to test. Each person is differently 
loaded with them, that is, made up of them in different 
proportions. The y’s are uncorrelated fictitious tests : the 
f’s are uncorrelated fictitious persons. 
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Now, from the equations— 


1 
“= zya" 
z _ t+ oy 
= 

4/6 


we can find the amount of each factor y, and y, possessed 
by each person, by inserting his scores 2 and ®, in these 
equations, scores which are given in the matrix : 


| a b G d 


1 |= 6 2 0 4 
2 3 1 —1 —8 
3 8 — 3 1-1 


Thus the first person possesses y, in an amount 
— 6/24/14, because his 2, is — 6. For the four persons 
and: the two factors we find the amounts of these factors 
Possessed by each person to be: 


Factors Yı Yeo 
3 
@ | ayia 0 
al 2 
> | via v6 
1 
c (ye 6 
2 1 
a | via ve 


4. Reciprocity of loadings and factors.—These are the 
amounts of the factors y possessed by the four persons. If 
now the reader will compare them with the loadings of 
the factors f in the second set of equations on page 265, 
he will see a resemblance. The signs are the same, and 
the zeros are in the same places. Moreover, the resemblance 

ecomes identity if we destandardize the factors fı and fo, 
Measuring the former in units 4/84 times as large, and the 
latter in units 4/12 times as large, 84 and 12 being the 
non-zero latent roots of both matrices. In these units let us 
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use 9, and ọ, for them. The equations on page 265 giving 
the analysis of the persons then beeome— 


7 = EE (vaif) E 

b= E va atie 
PA — 5 (VE = - 7” 
a= 2 a- AE ae e 


It will be seen that the loadings of o, and @ are identical 
with the amounts of yı and y in the table on page 267. 
A similar calculation could be made comparing the amounts 
of fı and f possessed by the tests with the loadings of y, 
and y, (suitably destandardized) in the analysis of the 
tests. As we said at the outset, if suitable units are chosen 
for the marks and the factors, the loadings of the personal 
equations are the factors of the test equations, and the 
factors of the personal equations are the loadings of the 
test equations. But only for doubly centred matrices of 
marks. It would be wrong to conclude in general that 
loadings and factors are reciprocal in persons and tests. 

Indeed, even for doubly centred matrices of marks, this 
simple reciprocity holds only for the analysis of the 
covariances and not for analyses of the matrices of corre- 
lations. Except by pure accident (and as it happens, 
Burt’s example is in the case of test correlations such an 
accident), the saturations of the correlation analysis will not 
be any simple function of the loadings of the covariance 
analysis. 

5. Special features of a doubly centred matriv.—But in 
any case, a matrix of marks which has been centred both 
ways is one in which only a very special kind of residual 
association between the variables is present. Most of what 
we commonly call the association or resemblance between 
either tests or persons, the amount of which we gauge by 
the correlation coefficient, is due to something over and 
above this. We can write down an infinity of possibly raw 
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matrices from which Burt’s doubly centred matrix might 
have come. ‘To the rows of the latter matrix we can add 
any quantities we like without in the slightest altering the 
correlations between the tests, but making enormous 
changes in the correlations between the persons. Let us, 
for example, add 10 to the top row, 13 to the middle row, 
and 16 to the bottom row. There results the matrix : 


a b c d 


1 OD Gy ie 
2 | 16 men enan Tom) 
Pama) aby ibe 1G 


This gives as correlations between the persons : 


a b c d 


1:00  -75 ‘84 —-14 
75 1:00 -28 —-76 
| -84 +28 1:00 -42 
| —14 —-76 -42 1-00 


Mees 


Next, without changing this matrix of correlations 
between persons in the slightest, we can add any quantities 
we like to the columns of the matrix of marks, and produce 
an infinity of different matrices of correlations between 
tests. If, for example, we add 5, 2, 8, and 9 to the four 
columns, we have a matrix of raw marks : 


a b c d 


} 
| 
1 | 9 14 18 23 
9. | Sore = 16) mz0T EB) 
3 | 24 15 25 24 
This has the same correlations between persons, but the 
Correlations between tests are now : 


asi: 2 3 
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Or instead, by adding suitable numbers to the columns 
and to the rows, we might have arrived at the matrix : 


a b c d 


or equally well at : 


1 8 45 87 48 
2 84 84 26 26 (D) 
3 sA SON iSSi 28 


The order of merit of the persons in each test is quite 
different in each of these matrices. The order of difficulty 
of the tests for each person is quite different in each. If 
we consider the ordinary correlation between Tests 1 and 2, 
we find that it is negative in (B), zero in (D) and positive 
in (C), yet all of these matrices reduce to Burt’s matrix 
when centred both ways. It is clear that they contain 
factors of correlation which are absent in the doubly 
centred matrix. 

The averages of the rows and the columns of (C) are as 
follows : 


| a b c d jee? 
feu eat as, is “S10 mar eo 
Oe) 168. SY «27 2-18 40 
3 | 58 48 24 10 35 


| 
J. 
Average | 55 51 28 11 | 


The correlation between two tests is clearly influenced 
very much by the fact that here the person a is so much 
cleverer than the person d. Similarly, the correlation 
between two persons is influenced by the fact that Test 1 
is more difficult than Test 2. As soon as the matrix is 
centred both ways, all the correlation due to these and 
similar influences is almost extinguished. Centred by rows, 
(C) becomes : 
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| 14 18 —12 —20 
23 Lae 
23 WI 4h = 5 | 


and all the tests are equally difficult on the average. 
Centred by columns as well, it becomes : 


6 2 0 4 

| 38 Si EG 

Zz Ea 
and not only are all the tests equally difficult on the average, 
but all the persons are equally clever on the average. It 
is to the covariances still remaining that Burt’s theorem 
about the reciprocity of factors and loadings applies. It 
does not apply to the full covariances of the matrix centred 
only one way, in the manner usually meant when we speak 
of covariances or of correlations. 

6. An actual experiment.—In Part III of Burt’s The 
Factors of the Mind (London, 1940) his principle of reci- 
Procity of tests and persons is seen in an actual illustrative 
experiment on the distribution of temperamental types. 

This experiment was on twelve women students, 
Selected because the temperamental assessments made by 
various judges on them were more unanimous than in the 
case of the other students. Each, therefore, was a well- 
marked temperamental type. They were assessed for the 
eleven traits seen in the table below. The assessments 
Over each trait were standardized, i.e. measured in such 
Units and from such an origin that their sum was zero and 
the sum of their squares twelve, the number of persons, 
So that the group was (artificially) made equal in an 
average of sociability, sex, ete. The correlations between 
the traits were then calculated and centroid factors taken 
out, the first two of which I shall call by the Roman letters 
“ and v. These two are possessed in some amount by 
each of the persons and required, in degrees indicated by 
the saturation coefficients, by each of the traits. These 
Saturation coefficients have been found by analysis of the 
Correlations between the traits. 
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Now according to the reciprocity principle, if we analyse 
instead the correlations between the persons, find factors 
which we may indicate by Greek letters, and measure the 
amounts of these possessed by the eleven traits, these 
amounts ought to be the same as the saturation coefficients 
of the Roman factors u, v, ete. 

Burt therefore further standardizes the assessments, 
by persons this time, and finds the total scores on each 
trait, which are, by a property of centroid factors (see 
page 217) proportional to the amounts of a centroid Greek 
factor possessed by the eleven traits; and the test of the 
reciprocity hypothesis is to see whether these totals are 
similar to the saturations of a Roman factor. The figures 
(from Burt’s page 405) are given in the table below : 


Saturations of the | Amounts of the 

Roman factors Greek factor 

u v % 
Sociability . . . 671 508 587 
Sex . . . é > 878 213 489 
Assertiveness; . . 827 483 878 
Joy . = > 5 . “951 1233 +297 
Anger c ` . 5 "824 241 +280 
Curiosity . . $ -780 — 268 ‘001 
Fear . : . 5 -898 — 159 — +089 
Sorrow . . . . "259 — 104 — +3387 
Tenderness . . . “564 — 667 — 447 
Disgust . . . - | 830 — 490 — 489 
Submissiveness . r A 412 — 685 | — +525 


Clearly the amounts of « do not correspond to the 
saturations of w; not should they, for a general factor 
has already been eliminated by the double standardization. 
They do, however, agree reasonably well with the satura- 
tions of the second Roman factor v, and confirm Burt’s 
prediction that, even in this sample, and with factors 
which are not exactly principal components, the reci- 
procity principle would still hold approximately, 


EEE" 


PART VI 
THE INFLUENCE OF SELECTION 


CHAPTER XVIII 


THE INFLUENCE OF UNIVARIATE SELECTION 
ON FACTORIAL ANALYSIS* 


1. Univariate selection——All_ workers with intelligence 
tests know, or ought to know, that the correlations found 
between tests, or between tests and outside criteria, depend 
to a very great extent indeed upon the homogeneity or 
heterogeneity of the sample in which the correlations were 
measured. If, to take the usual illustration, we measure 
the correlation between height and weight in a sample of 
the population which includes babies, children, and grown- 
ups, we shall obviously get a very high result. If we 
confine our measurement to young people in their “teens, 
we shall usually get a smaller value for the coefficient, of 
correlation. If we make the group more homogeneous 
still, taking, say, only boys, and all of the same race and 
exactly the same age, the correlation of height and weight 
will be still less. Through all these changes towards 
greater homogeneity in age, the standard deviation (or its 
Square, the variance) of height has also been sinking, and 
the standard deviation of weight also. The formule which 
describe these changes: were given in 1902 by Professor 
Karl Pearson,{ and when the selection of the persons 
forming the sample is made on the basis of one quality 
Only, these formule can be put into the following very 
Simple form. 

Let the standard deviations of (say) four qualities be 


7 glomson, 1987 and 1938b. ) 
ee homogeneity need not necessarily, in the mathematical 
Sete] decrease correlation, and occasionally it does not do so in 
+ 7 Psychological experiments. But it almost always does so. | 
E formulæ are not, as was once thought, only applicable if 
hin are normal (see Lawley, 1948c, where the necessary 
ee ions are stated). They have been found by trial to give good 
both S even when the sample has been made by cutting off a tail, or 
tails, of the distribution. 
` 275 
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in the complete population—we must, of course, in each 
case define what we mean by the complete population, as 
for example all living adults who were born in Scotland— 
given by Z; Zp, Ug, and X, and their correlations by 
Ry, Ris, ete. Now let a selection of persons be made who 
are more homogeneous in the first quality—say, in an 
intelligence test which has been given to them all—so that 
its standard deviation in the sample is only o,, and write— 


Oy 
Z 
The smaller p, is, the more homogeneous the group is in 
intelligence-test score. If we write— 
gi = Vl — pi) 
qı will be larger, the greater the shrinkage in intelligence 
score-scatter from 5X to cq. We shall call q the “ shrink- 
age ” of the quality No. 1 in the sample. 
The other qualities 2, 3, and 4, being correlated with the 
first, will tend to shrink with it, and their expected shrink- 
ages qə, q3, and q, can be calculated from the formula— 


li = HR; 


For the sort of reason indicated earlier in this paragraph, 
the correlations of the four qualities—which we are for 
simplicity in exposition assuming to be positively correlated 
in the whole population—will also alter, according to the 
formula— 


— 


Ry = tty 

PPj 

2. Elementary proof.—This formula can be readily 
proved, for the case where the average is unchanged, by 
using our geometrical model of correlation, in which tests 
or other variables are represented by lines all crossing each 
other at the “average man,” and at angles with one 
another whose cosines equal the correlation coefficients 
between the tests (see Chapter VI). 

In this perspective figure let O4, OB, and OC be three 
lines in three-fold space representing three tests. The 
triangle ABC is in a plane at right angles to OA. Write— 


ty = 
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cos « = cos BOA = Ry 
cos B = cos COA = Ry 
cos y = cos BOC = R» 

Take the distance OA as unity. Each test is standard- 
ized, so that its standard deviation is unity. Now let the 
standard deviation of Test 1 be reduced so that it becomes 
Pı =OD. This means, in our geometrical model, that the 
whole three-fold space in 
which our lines OA, OB, 
and OC exist is compressed 
from 4 towards O, and 
every line parallel to this is 
shortened in the same way. 
The point B moves up to 
E, and the point C to F. 
The whole triangle ABC is 
lifted up, remaining at 
right angles to the line OA, 
to a new position DEF. 
The test lines OB and OC 
become OF and OF. The 
angle y = BOC has become 
the angle y’ = EOF, and 
Cos Y’ represents the new Figure Si, 
correlation coefficient be- 
tween Tests 2 and 3. Our object is to find cos y’ in terms 
of the known quantities «, 8, y, and p. One method is to 
xpress BC? in terms of the triangle BOC, and EF in terms 
of the triangle EOF, and equate them, since BC = EF. 

First note that 

OB: — OF} = 042 — OD? =1— p’ = Q? 


and similarly oc: — OF? = q? 
Also pa = OL/OB, and p, = OF/OC 
OB: — OE? npe 
Further, g= Pat =—on q'/OB 
and similarly q? = %:2/0C? 


Ow, since 
BC? = OB: + OC? — 20B.0C cos y 
and = EF: = OF? + OF — 20E.0F cos y' 


278 THE FACTORIAL ANALYSIS OF HUMAN ABILITY 


we have, subtracting, 
0=(0B?— OEF*)+(OC*— OF*)—20B.0C cos y+20E.0F cos y’ 
= q? + Q? —20B.0C cos y+20E.0F cos y’ 


whence 
OB.OC cos y Eh, 


cos y = ——_____ 


OE.OF 
ba au 
OB OC 
a 
OB OC 
z LOSNA Ila 
Toz: 
Rog = 923 

P2P3 : 

8. A numerical example.—Let us define our “ whole 
population ” as all the eleven-year-old children in Mas- 
sachusetts, and let us suppose (the numbers are entirely 
fictitious) that the standard deviations of all their scores 
in four tests are : 


cos y — 


or Tag = 


1. Stanford-Binet test 16:5 = $}, 
2. The X reading test 24:9 = %., 
8. The Y arithmetic test 27-3 = Da, 
4. The Z drawing scale 14-2 = £, 


while the correlations between these four, in a State-wide 
survey, are (these are the R correlations) : 


eal 2 3 4 

| 
1 } 69 75 -32 
2 | -69 ; 54 +18 
Be 75) — | 354 , -06 
4 |: 


32 18 “06 


Now let a sample of Massachusetts eleven-year-olds be 
taken who are less widely scattered in intelligence, with 
a standard deviation in their Stanford-Binet scores of 
only 10-2. How will all the other quantities listed above 
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tend to alter in this sample? We have, using the formule 
quoted, the following— 


a = V(1 — 618%) = -786 
a from q; = q,R,; we have the other shrinkages q, and 
a coefficients p and the new standard deviations 
c= p : 
| 1 2 3 4 


q -786 -542 590 :252 
p 618 -840 -808 -968 
(od 10:2 20:9 221 138-7 


A The formula for ry then enables us at once to calculate 
e correlations to be expected in the sample, namely : 


| 1 2 3 4 
1 | z -509 574 204 
2 -509 ‘ -825 054 
3 574 825 . — 118 
4 -204 054 — 113 


ee greater homogeneity in the sample has made all the 
relation coefficients smaller, and has indeed made 134 
ecome negative. 
ie reader should note that these standard deviations 
‘nn atona are what result from selecting on the Stan- 
i inet test, letting the other changes happen m con- 
quence. It would be quite a different matter to select on 
ae reading test. Even if we did so, so as to reduce the 
A ng test standard deviation from 24-9 to 20:9 as 
eae above, the other changes would be quite differ- 
ane ie Stanford-Binet standard deviation would, for 
ae R e, not be reduced to 10-2 but only to 15:3. And 7p 
ine not be ‘574, but -722. The difference, in terms of 
any eae 81, is that whereas selecting the Stanford-Binet 
wee nee to shortening the line O04 and with it all 
allel distances in the space, selecting the reading test 
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corresponds to shortening OB and all distances parallel to 
it: quite a different distortion of the space. 

4. From sample to population.—In the above numerical 
example we supposed that the standard deviations and 
correlation coefficients were known in the whole popula- 
tion of Massachusetts eleven-year-old children, and asked 
what they would become in a sample with a smaller scatter 
in the Stanford-Binet score. The problem might, however, 
be reversed, in which case, with a little care, the same 
formula can be used. 

Let us suppose that we know from experiment the above 
facts about the sample—the standard deviations 10-2, 
20-9, 22-1, 18-7, and all the correlation coefficients in the 
table -509, -574, ete.—and that we know further that the 
standard deviation of Stanford-Binet scores in the whole 
population in question is 16-5. The sample we have 
worked with is obviously a biased one, restricted in range 
of Stanford-Binet scores, and we wish to estimate what our 
correlation coefficients would have been if we had tested 
all Massachusetts eleven-year-olds, or, at least, an un- 
biased sample. We want, indeed, to work the above 
example backwards. 

The quantity p; is, in this direction, greater than unity, 
namely— 

16-5/10-2 = 1-618 
and q? =1— p? = — 1-617 


The quantity q, is therefore the square root of a minus 
quantity, which we express as— 
h = V(1:617)i = 1-2724, where i = V—1 


The other q’s can be got from q, by the same formula as 
before, namely q; = qR where R now means a correlation 
coefficient in the sample. Thus— 


G = QR = 1-272i x -509 = -647i 
Gs = QR = 1-272% X +574 = -730% 


Il 


Then— 
pè =1— Ge? =1 + 647? (for #=—1)=1-419 3 pa=1191 
and similarly p = 1-238. A 
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We then have— 
Rog — Qaqa 325 — -647i X +7302 
Tog = — = 
z PPs 1-191 x 1-238 


_ 825 +472 y4 
1:475 

as in the table for the population. In this way that table 
can be completely reconstituted. It is then, of course, 
only an estimate and, moreover, an estimate based on the 
assumption that our sample differs from the population 
only by reason of one of the four variables—namely, the 
Stanford-Binet score—being restricted, deliberately or 
accidentally, the other restrictions being supposed to have 
followed sympathetically by reason of the correlations. 
In few practical examples can we be sure of the mode of 
selection. 

5. Variance of differences between scores.—Our numerical 
example enables us to illustrate a very useful fact, that the 
Variance of the differences between the scores in two tests 
is independent of the amount of selection if both tests have 
been equally shrunk, and is reasonably constant when this 
Condition is not too much departed from. ’ 

For example, o? for the differences between the scores in 
Tests 2 and 3 would be, by the formula— 

p25 = 03? F Gg” — 2230203 
equal in the population to— 

24-92 4 97.32 — 2 X 24-9 X 27:3 X 54 = 681-15 
and in the sample to— 

20-92 4. 22-12 — 2 x 20:9 X 221 x B25 = 625:0 
that is, almost the same, although p, does not quite equal 
P 3 This fact gives another method of estimating a popu- 
lation correlation if the sample correlation between 

ifferences can be calculated, and if the standard devia- 
tions in the population are known or can be guessed. For 
<xample, suppose a worker with the sample calculated 
Tom his data the value— 


2. = 62 
O23 = 625 
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and had reason to think that in the population, or in some 
other sample, the standard deviations were 25 and 27 (as 


they nearly are in our example), he could estimate the 
unknown correlation as— 


2 X 25 x 27 — 625 
2K25 X27 
Actually it was -54. But this method would fail badly if 


the quantities p, and p; were markedly different (Emmett, 
1951, B.J.P.Statist., 4, (1)). S 


6. Selection and partial correlation.—If a sample is made 
completely homogeneous in the Stanford-Binet test, clearly 
Pı =0 and gq =1. The same formule then give us : 


= -587 


1 2 3 4 
q 1 “69 “75 -82 
p 0 “524 488 -904 
o 


0 180 119 128 


and the resulting correlation coefficients, which in this case 
are called “ coefficients of partial correlation for constant 
Stanford-Binet score,” are, by the same formula : 


| 1 2 3 4 
1 . . 
2 : : 098 — -086 
3 | - -098 E; 
4 — -086 — -455 


The correlations of the Stanford-Binet test with the 


others are given b Ta > 
3 the fi ter- 
minate. That He € formula as 0/0, that is, inde 


Ant eA y are really zero is seen from the fact 

mae Pais aken as not quite zero, but very small, 
se correlations come out by the formula as very small. 

They vanish with Pi: 3 

5 In em special case of “ partial correlation,” where the 

directly selected test is so stringently selected that everyone 

in the sample has exactly the same Score in it, our formula— 
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has a more familiar form. For since— 


and n= 
in this case of complete shrinkage we have— 
G = fiy; i 
and Pi = V — Ri’) 
so that our formula becomes— 
fy oo Ry — RyRy 
Vl — Ry?) V — Ry’) 
the usual form of a partial correlation coefficient. Its 
more conventional notation is, calling the test which is 
made constant Test k instead of Test 1— 


Ae V Tale 
ue VL — Ta?) VL — Tj’) 
If the “test? which is held constant is the factor g, 
this beeomes— 


Tij — Tig" in 

wo = = r) VA = My!) 

Which is called the “ specific correlation ” between i and j. 
Its numerator is the “residue” left after removing the 
Correlation due to g. If g is the sole cause of correlation, 


peuine g constant will destroy the correlation and we shall 
ave— 


1 


Ti = Tigjg 
as we already saw from another point of view was the case 
m a hierarchical battery, in Section 4 of Chapter I. 


7. Effect on communalities—The formula— 
: Ry — %% 
ry = 2 
; PP; i 
1s thus a very useful formula, including partial correlation 
aS a special case. If the original variances are each taken 
as unity, the numerator Ry — qq, for i + j gives the new 
Covariances, while p and p; are the new variances. 

Tt also includes as a special case the formula known as 
the Otis-Kelley formula, which is applicable when two 
Variates have both shrunk to the same extent (a restriction 
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not always recognized). If we put q; = qj and therefore 
P: = p; it becomes— 


pry = Rj -—G= Rj—1 +p? 
pl -—r)=1—R 
1 — Rij a G;? 

2 


g? : “mula: 
=P" = 5, =F, the Otis-Kelley formula 
1— Tij P Dy =; ® 


ij 


It has a still further application (Thomson, 1938b, 456), 
for if a matrix of correlations in the wider pop uation TA 
been analysed by Thurstone’s process, this same soia 
‘gives the new communalities (with one endian a k 
expected in the sample, if we put i = j ee a As) 
Rii the communality in the wider population, by e n 
communality in the sample (and not a reliability cosi oe s 
which is the usual meaning of this symbol). tiog he 
usual symbol h? for communality we have the formula in 
the form— 


np E 4 sha) 
j P? 

The exception is the new communality of the trari a 
quality which has been directly selected, m our ae d 
No. 1 the Stanford-Binet scores. For the directly selecte 
trait the new communality is given by— 


nam pry? 
1 — 43H? 

(Thomson, 1938b, 455; and see also Ledermann, 19380). 
With these formule we can see what is likely to happen 
to a whole factorial analysis when the persons who are the 
subjects of the tests are only a sample of the wider popula- 
tion in which the analysis was first made. . 

8. Hierarchical numerical example.—We shall take, in 
the first place, the perfectly hierarchical example of our 
Chapter I. But to Save space in the tables we shall con- 
sider only the first four tests. Their matrix of correlations, 
with the one common factor and the four specifies added, 


and with communalities inserted in the diagonal cells, was 
as follows : 
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| 1 2 3 4 | EN N S. Ss 
MCE E ay a 
2 | -72 (64) -56 -48 | 80 . 60 
8 | -68 -56 (49) -42 | 70 . orale 
4 | 54 48 42 (36) | 60. 3 . 80 
g | 90 -80 o -60 |100 . 
% | 44 Se ara. |; 
| oO . ; ; - 1:00 : a 
a N z HS 3 z 5 aK 
GM s0 | . 00 


The bottom right-hand quadrant. shows, by its zero 
entries, that the factors are all uncorrelated with one 
another, that is, orthogonal. The tests expressed as linear 
functions of the factors are— 


% = ‘9g + -436s, 
8g + 600s, 
‘7g + ‘71485 
za = 6g + °8008, 


These equations are only another way of expressing the 
Same facts as are shown in the north-east, or the south- 
West, quadrant of the matrix (where only two places of 
decimals are used for the specific loadings, to keep the 
Printing regular), i 

et us now suppose that this matrix and these equations 
refer to a wide and defined population, e.g. all Massa- 
chusetts eleven-year-olds, and let us ask what will be the 
most likely matrix of correlations between these tests and 
actors to be found in a sample chosen by their scores in 
Test 1 so as to be more homogeneous. The variance of 
Test 1 in the wider population being taken as unity, let 
Us take that in the more homogeneous select sample as 
being PM? = -36. We then have, using q; = Ru and 


Eas g and the specifics just like tests, the following 
able ; 


$ 
Il 


PPS 4 | E A 
SSS eee 
2 ‘80 -576 504 -432 -720 -349 
a h "60 -817 864 -902 694 -987 1 11 
P? (variance) | -36 -668 -746 -813 A82 -878 1 1 1 
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For the correlations and communalities, using our 
formula— 


Ry — u 
PP; 
we get (again printing only two decimal places) : 
1 2 3 4 g Ss, S2 S3 S4 
I} (G61); 58 AA 86 | 7s 2g 
2 58 (:46) 88 -31 | -68 —26 -73 . 
3 p88 (82) -26 | -56—-22 . ESS 
4 36 <81 26 (21) | 46 —18 . 89 
g ‘78 -68 -56 -46 1:00 —-39 
% | +28 —-26 —22 —18 |—39 100 . 
Se 73 D SEO rA 
Ss : Sec: eees š f AT & 
Sy 3 : . 80 ¢ F 2 - 1-00 


In the more homogeneous sample, therefore, the 
correlations and the communalities of all the tests have 
sunk. The g column shows what the new correlations of g 
are with the tests; and on examination of the matrix we 
see that these, when cross-multiplied with one another, 
still give the rest of the matrix. Thus— 


78 X -46 = -86 (744) 
*68°= -46 (hy?) 
The test matrix is still of rank 1 (Thomson, 1988), 453), 
and these g-column entries can become the diminished 
loadings of the single common factor required by rank 1. 

The columns for the specifics 5%, 83 (and later specifies 
also) still show only one entry. In the bottom right-hand 
quadrant, zero entries show that these specifics are still 
uncorrelated with one another and with g, that is, g, 5, S3 
and s, are still orthogonal. 

But something has happened to the specific s It has 
become correlated with g, and with all the tests. It has 
become an oblique factor, orthogonal still to the other 
specifies, but inclined to g and the tests. It leans further 
away from Test 1 than it formerly did, and makes obtuse 
angles (negative correlation) with the other tests and with 8 
to which it was originally orthogonal. 
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But since, as we have already pointed out, the test matrix 
with the reduced communalities is still of rank 1, it is 
clear that a fresh analysis could be made of the tests into 
one common factor and specifies, thus— 


Z = -T78g' + -628s,’ 
Za = -679g’ + 734s 
Rg’ = -562g" + -82733 
Z4 = 462g’ + -887s, 


In these equations the factors g's Sis Sq, Sg, and Sq are 
again orthogonal (uncorrelated), and the loadings shown 
ive the correlations and give unit variances. This is the 
analysis which an experimenter would make who began 
With the sample and knew nothing about any test measure- 
ments in the whole population. 

The reader, comparing the loadings in these equations 
with the correlations in the matrix of the sample, will 
tightly conclude that the specifics from s, onward have not 
changed. In the matrix it is clear that they are still 
orthogonal, and their correlations with the tests, in the 
matrix, are the same as their loadings in the equations. 

€ tests are, in the sample, more heavily loaded with these 
Specifics than they were in the population, but the specifies 
are the same in themselves. 

The new specific sy the reader will readily agree to be 

ifferent from sı The latter became oblique in the 
Sample, whereas 5,’ is orthogonal. What now is to be said 
about the common factors g (in the population) and g’ (in 

€ sample) ? From the fact that the loadings of g’, in the 
Sample equations, are identical with the correlations in 
t © sample matrix of the original g with the tests, one is 

+ ™Mpted to imagine g’ and g to be identical in nature. But 
at is not so certain. 
iba We go back to the equations of the tests in the popu- 

10n, we can rewrite them in the following form— 


% = 467g’ + -800g" + -877s,' 
% = -555g' + -576g" + 600s, 
Za = 485g" + 504g" + -7145, 
Z4 = 417g" + 482g” + -80054 
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with two common factors g’ and g” instead of one common 
factor g. These equations still give the same correlations. 
For example— 


Tia = 467 X -417 + -800 x -482 = -540 as before. 


In these equations the specifics So, Sg, S4 are the same, and 
the communalities of Tests 2, 8, and 4 are the same. All 
that we have done in these three tests is to divide the 
common factor g into two components. The ratio of the 
loading of g” to the loading of g is the same in each of 
them. The loadings of g” we have made identical with the 
shrinkages q in the table on page 285. 

In Test 1 also we have made the loading of g” equal to 
the shrinkage q, = -8. But in this test g” cannot be looked 
upon merely as a component of g. To give the correct 
correlations, the loading of g’ has to be -467 as shown, and 
the communality of Test 1 has been raised from its former 
value (-81) to— 

4672 + -8002 = -858 


while the loading of the specific has correspondingly sunk. 
The factors g’, g”, and 8,’ are a totally new analysis of 
Test 1 in the population. Part of the former specific has 
been incorporated in the common factors. 

Now let the factor g" be abolished, i.e. held constant, so 
that the tests (now of less than unit variance, so we write 
them with æ instead of z) are— 


Variances 
a, = -467g' + 877s, — -360 
Uy = 555g" + 600s, 668 
X, = -485g' + Tl4s, -746 
24 = -417g + 800s, -813 


The reduced variances are the sum of the squares of the 
surviving loadings, e.g.— 


“4672 + -3772 = -360 


The variances, it will be seen, are the p?’s of our tests 
as measured in the sample. If each of the last set of 
equations is divided through by the square root of its 
variance, we arrive at the equations— 
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3 = -778g' + -628s,’ 
Zo’ = -679g' + 73483 
zg = -562g’ + -827s, 
Z4 = 462g’ + -887s, 
which is the analysis already given as that of an experi- 
menter who knew only the sample. As to the nature of g’, 
we can say in Tests 2, 3, and 4 that it is possible to regard 
it as a component of the g of the population. But we 
cannot do so with assurance in Test 1. There its nature is 
More dubious. At all events, it is not the same common 
factor as in the population, and at best we can say that it 
is one of its components. 

9. A sample all alike in Test 1—These phenomena are 
still more striking if we consider a case where the sample 
is composed of persons who are all alike in Test 1. It 
Would be an excellent exercise for the reader to calculate 
the resulting matrix of correlations for tests and population 
factors in this case. The tests act in this case as though 
their original equations in the population had been— 
a E 
= 849g" + 720g" + -600s 

za = °B05g' + -680g" + -7145, 

za = *262g' + 540g" + -800s, 
and then g" had become zero, i.e. a constant with no 
variance, 

It perhaps helps to a further understanding of what is 
4ppening to the factors during selection if we realize that 
olding the score of Test 1 constant does not hold its factors 

£ and s constant. They can vary in the sample from 
man to man, but sinee— 

z = ‘9g + -486s, 
Temains constant, a man in the sample who has a high g 
must have a low s;—that is, these factors are negatively 
Correlated in the sample. And because they are thus 
negatively correlated, those members of the sample who 
ave high g’s, and who will therefore tend to do well in 
Tests 2, 3, and 4, will tend to have values below average 
(negative values) for their s, which will be therefore 
negatively correlated with these tests, in this sample. 
FA—10 
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So far in our examples we have assumed the sample to 
be more homogeneous than the population. But a sample 
can be selected to be less homogeneous. In such a case 
the same formule will serve, if we simply make the capital 
letters refer to the sample and the small to the population. 
In fact, the same tables, with their réles reversed, can 
illustrate this case. In practical life we usually know which 
of two groups we would call the sample, and which the 
population. But mathematically there is no distinction, 
the one is a distortion of the other, and which is the “ true ” 
state of affairs is a question without meaning. 

It must also throughout be remembered that all these 
formulz and statements refer, not to consequences which 
are certain to follow, but to consequences which are to be 
expected. If actual samples were made the values experi- 
mentally found in them for correlations, communalities, 
loadings, etc., would oscillate about those given by our 
formule, violently in the case of small samples, only 
slightly in the case of large samples. 

10. An example of rank 2.—The above example has only 
one common factor. We turn next to consider an example 
with two. Again it is, we suppose, the first test according 
to which the sample is deliberately selected, and again 
we suppose the “ shrinkage ” q to be :8. The matrices 
of correlations and communalities, in the population and 
in the sample, are then as follows, the two factors fı and fy 
and the specifics being treated in the calculation exactly 


as if they were tests. To economize room on the page, 
we omit the later specifies : 


Correlations in the Population 


Teese Ws 4 5 en are. cht E 

1 | (65) -46 +59 86 41 70 +40 -59 . 
AAS eE Be aa osr D do aooo i o 
3 so 1:36) C61) 82 45 |- E0 60 

4 36 -2 82 (20) -22 | 40 -20 

5 E E O EAEN A 
fi SA 60 -50 .40 T (1-00) . 
fz | -40 10. -60 -20 -50 (1:00) . . 
E : : . (1:00). 
Sp a N - (1-00) 
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Correlations in the Sample 


1 2 3 th es fide Nise Rese Sa 
| 
1 | (40) -30 -40 23 -26 51 25 -40 
2 | -80 (27) 28 17 32 | 5I —-02 —:21 85 
3 40 -23 (50) 22 -35 32 -54 —-29 
4 28 17T 22 (13) T4 | -30 -12 —16 
5 26 12 -35 14 (-26) | 15 -44 —19 
fi | 51 -51 32 -380 -15 | (1-00) —-28 —:36 
fa | +25 —02 -54 +12 -44 | —-28 (1-00) —-18 
sı 40 —-21 —:29 —'16 —-19 | —-36 —-18 (1:00) . 
Sq 7 85 i 3 (1:00) 


We see here a new phenomenon. The two common 
factors f, and f in the population were orthogonal to one 
another, as is shown by the zero correlation between them. 
But in the sample they are negatively correlated (— +228) ; 
that is, they are oblique. We begin to see a generalization 
which can be algebraically proved, that all the factors, 
common and specific, which are concerned with the directly 
selected test(s) become oblique to each other and to all the tests, 
but the specifics of the indirectly selected tests remain orthogonal 
to everything, except each to its own test. 

But the matrix of the-tests themselves is still of rank 2, 
and an experimenter working only with the sample would 
find this out, although he would know nothing about the 
Population matrix. He would therefore set to work to 
analyse it into two common factors, orthogonal to one 
another. A Thurstone analysis comes out in two common 
factors exactly, and can be rotated until all the loadings 
are positive. For example : 

Test | 1 2 3 4 5 


Factor f’ | -570 -521 436 -332 288 
Factor fy | -276 . 555 1380 452 
These factors f’, however, are clearly a different pair 
from the factors f in the original population. In the 
sample, those original factors (f) are oblique ; these (f’) 
are orthogonal. 
Again the whole phenomenon is reversible. The second 
matrix (with the orthogonal factors f’) might refer to the 
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population, and a sample picked with a suitable increased 
scatter of Variate 1. All our formule could be worked 
backwards, and we should arrive at the matrix beginning 
(-65), referring now to the sample. The f’ factors would 
have become oblique, and a new analysis, suitably rotated, 
would give us the other factors f. 

It becomes evident that the orthogonal factors we obtain 
by the analysis of tests depend upon the subpopulation we 
have tested. They are not realities in any physical sense 
of the word; they vary and change as we pass from one 
body of men to another. It is possible, and this is a hope 
hinted at in Thurstone’s book The Vectors of Mind, that if 
we could somehow identify a set of factors throughout all 
their changes from sample to sample (in most of which 
they would be oblique) as being in some way unique, we 
might arrive at factors having some measure of reality 
and fixity. Thurstone, in his latest book Multiple. Factor 
Analysis, believes that he has achieved this, and that his 
oblique Simple Structure is invariant. His claim is con- 
sidered in our next chapter. It is, in the present writer’s 
opinion, justifiable only for univariate selection, not for 
multivariate, which is not merely repeated univariate 
selection. 

11. Random selection—These considerations deal with 
the results to be expected when a sample is deliberately 
selected so that the variance of one test is changed to some 
desired extent. The new variances and the changed 
correlations of the other tests given by our formula— 


a 

"BB; 
are not the certain result of our action in selecting for Test 1. 
If we selected a large number of samples of the same size, 
all with the same reduced variance in Test 1, they would 
not all be alike in the resulting correlations. On the con- 
trary, they would all be different. But most of them would 
be like the expected set, few would depart widely from that ; 
and the departures would be in both directions, come 


samples lying on the one side, others on the other side, 
of our expectation. 
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If now, instead of selecting samples which are all alike 
in the variance of one nominated test, we take a large 
number of random samples of the same size, what would we 
find? Among them would be a number which were alike 
in the variance of Test 1, and these in the other part of 
the correlation matrix would have values which varied 
round about those given by our formula. We could also 
pick out, instead of a set all alike in the variance of Test 1, 
a different sct all alike in the variance of Test 4, say ; 
and these would have values in the remainder of the matrix 
oscillating about our formula, in which Test 4 would replace 
Test 1. In short, a complex family of random samples 
would show a structure among themselves such that if we 
fix any one variance the average of that array of samples 
obeys our formula.* Random sampling will not merely 
add an “ error specific ” to existing factors, it will make 
complex changes in the common factors. 


* On the author's suggestion, Dr. W. Ledermann has since 
Proved this conjecture analytically (Biometrika, 19394, 30, 295- 
8304). His results cover also the case of multivariate selection (see 
next chapter), 


CHAPTER XIX 


THE INFLUENCE OF MULTIVARIATE 
SELECTION * 


1. Altering two variances and the covariance-—In the pre- 
ceding chapter we have discussed the changes which occur 
in the variances and correlations of a set of tests, and in 
their factors, when the sample of persons tested is chosen 
according to their performance in one of the tests: we 
are next going to see the results of picking our sample by 
their performances in more than one of the tests, first of 
all in two of them. Take again, the perfectly hierarchical 
example of the last chapter. We must this time go as far 
as six tests in order to see all the consequences. The matrix 
of correlations of these tests and their factors will be 
simply an extension of that 

Now let us imagine a sample picked so that the variance 


of Test 1 and ally altered, 


and hence their correlation) 
ed value. 


d the laws of logic. What, 
sympathetic changes in the 
e other tests of the battery ? 
d the variance of Test 1 from 
diminution in variance to be 

as, as is shown o 5 m 
unity to -668, and the consequent cna Sean 
from +72 to +58, Here, however, let us pick our sample so 


falling, rises to -833 We ha i 
3 3. ve, t 
people for our sample eae 


* Thomson, 1937 J 


y, chosen 
more alike 
Thomson and Ledermann, 1938. 
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than usual in these two test scores, as well as being closely 
grouped in each, an unusual but not an inconceivable 
sample.. Natural selection (which includes selection by the 
other sex in mating) has no doubt often preferred indi- 
viduals in whom two organs tended to go together, as 
long legs with long arms, and the same sort of thing might 
occur in mental traits. In terms of variance and covariance 
we have changed the matrix : 


to the matrix : 


9 
a 
te} 
oo 
ro 

cf 


-80 f . 
a ———$—= 3 = ‘833, the new correlation. Notice 


V(-36 Xx -36) 6 
that the diagonal entries here (unities in R, and -36, ‘36 
in Vp) are the variances, not the communalities. 
2. Aitken’s multivariate selection formula.—We shall 
symbolically represent the whole original matrix of vari- 
ances and covariances by : s 


yh ere the subscript p refers to the directly selected or 
Picked tests, and the subscript q to all the other tests and 
y © factors, R, (and also R,,) means the matrix of co- 
ariances of the picked tests with all the others, including 
vant ctors. R,, means the matrix of variances and co- 
X Mances of the latter among themselves. Since at the 
utset the tests and factors are all assumed to be stan- 
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dardized, the variances in this whole R matrix are 
unity, and the covariances are simply coefficients o 
correlation. In our case the R matrix is : 


Analysis in the Population 


ib 4 Sas (56 & SS 8 Ss Se 385 8S 
— 1 
11-00 -72 | -63 -54 -45 -36 -90 -44 . 
2 | 72 1-00 | -56 -48 40 32 -g0 . 60 

| 

3 | -63 :56 1:00 -42 -85 -28 -70 . A TE i ` 
4| -54 -48| -42 1-00 -30 -24 60 . . . [SO ame o 
5 |-45 -40| -85 -301-00 -20 -50 . : 4 ` “87 oe 
6 | 36 +82) -28 -24 -20 1:00 -40 “92 
g | -90 -80 | TO -60 :50 -401-00 . 2 
safa IS : 100. 
Sj . 60] . =m 1°00; |, 
Cel hats. ace a Tes > 1:00) 
Ss]. : a 80 s 100 . 
E E 87 100 . 
ARS a 92 1:00 


The R, matrix is the square 2 X 2 matrix, the R,, matrix 
the square 11 x 11 matrix, while R,, has two rows and 
eleven columns, R,, being the same transposed. 

Our object is to find what may be expected to happen 
to the rest of the matrix when R » is changed to p: 
Formule for this purpose were first found by Karl Pearson, 
and were put into the matrix form in 


which we are about 
to quote them by A. C. Aitken (Aitken, 1984). The matrix 
changes to : 


Va» | Vy By : Ry, 


Rp Ro Vp | Ra ale (Ro Fea. Voy Ry") Ry 
and in order to explain the meaning of these formule we 
shall carry out the calculation for a part of the above matrix 
only (the first four tests), with a strong recommendation to 
the reader to perform the whole calculation systematically. 
If we confine ourselves to the first four tests we have— 


1:00 -72 
Rk, = 
ze ee! ‘er 


1-00 -42 
Re = 
u Fe S| 
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-63 +54 
Ry =| -56 -48 ] 
-63 -56 
Rip =| 54-48 | 
The most tiresome part of the calculation, if the number 
of directly selected tests is large, is to find BE, ` the reci- 
procal of the matrix R,, such that the product— 


= 1 3 
By Heol il-7 


where Z is the so-called “unit matrix” which has unit 
entries in the diagonal and zero entries everywhere else. 
The method of doing this is given in Chapter XIV, 
Section 9, page 210. In the present example, where Rpp 
is only of dimensions 2 x 2, we soon find— 
Rua 2-0764 — 1:4950 
Pas | a950 2:0764 
When the reciprocal matrix Ry» has thus been calculated, 
the best way of proceeding is to find— 


CER R 
= = 


In the case of our example these are— 
c-[ 2:0764 —1-4950] [63 54] _ [4709 e] 
—1:4950 20764] |56 48 2209 1894 
D =| 1:00 42 “63 56] [4709 4037 
42 1-00| | -54 -48 || 2209 1894 
=| 1:00 “42 +4204 3604 
-42 1-00] |3604 +3089 
iea +0596 
0596-6911 
subtraction of matrices being carried out 
cach element from the corresponding one. 
2022 


v c—]-36 30] [4709-4087] _ [2858 zl 
a 20 -86| | -2209 -1894| |-2208 1893 


tly selected 


by subtracting 
We next need— 


which gives us the new covariances of the direc e 
tests with those indirectly selected. For Va We need still 
C'(V pC) where the prime indicates that the matrix is 
transposed (rows becoming columns)— 

F.A.—10* 
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f -4709 -2209 "2358 -2022 __ | 1598 cure | 

OVO) — E aa Eee Te] = e -1175 
and then— 


7 "5796 -0596 ‘1598 -1370 
Com Die C C= e cart + lias 1175 


Ie ml 
~ [3966 -8086 

We now can write down the whole new 4 x 4 matrix 
of variances and covariances. In the same way, had we 
included the other tests and the factors, we would have 
arrived at the whole new 13 X 13 matrix for all the 
variances and covariances which we now print.* The 
values calculated above for the first four tests will be 


recognized in its top left-hand corner. (The diagonal 
entries are variances, not communalities.) 


Covariances in the Sample 


(1987a) paper are extremely ec i i i 
matrices of the form XY- eee ue 


f can thus be obtai i ivotal 
operation (see Appendix, paragraph 12): Tra nii 


Ig 3 4 5 6 g Sı S B S5 Se 
1 | 386 30| 24 20 a7 l4 84 -13 -05 
2 | 80 36| 22 19 46 18 32 o4 ig 
8 | 24 22] 74 20 ‘l6 13 38—14 —o7 ara a D 
4/2019) 20 91 44 AL S28 —119 —=-06 , g0 
5/17 16| 16 14 ‘87 6-09 = 28 —-19 _.05 à i, BTaig 
@ aa e 18 11 09 ‘95 19 —.08 —.04 92 
& | 84 82) 38 .28 93 ‘19 -47 —-19 —-10 
S| 13 -04 |—-14 —-12 —10 —08 —-19 .7% 32 
82/05 -18 |—-07 —-06 —-05 Seno! 110/32" da) 9 
Seine T ri Oy 100 . 
Sa ; $BO gei. 1-00 
S| “87 Š 1:00 . 
So a 1:00 
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its own test), are still of unit variance, and have still the 
same covariances with their own tests, though these will 
become larger correlations when the tests are restan- 
dardized ; 

(2) The specifics of the directly selected tests have 
become oblique common factors, correlated with everything 
except the other specifies ; 

(3) The matrix of the indirectly selected tests is still of 
the same rank (here rank 1) ; 

(4) The variances of the factors g, s,, and s, have been 
reduced to -47, ‘70, and +48. 

An experimenter beginning with this sample, and 
knowing nothing about the factors in the wider population, 
would have no means of knowing these relative variances, 
and would no doubt standardize all his tests. He certainly 
would not think of using factors with other than unit 
variance. And even if he were by a miracle to arrive at 
an analysis corresponding to the last table, with three 
oblique general factors, he would reject it (a) because of 
the negative correlations of some of the factors, and 
(b) because he can reach an analysis with only two common 
factors, and those orthogonal. It is therefore practically 
certain that he will not reach the population factors, at 
least as far as the directly selected tests are concerned. 
His data and his analysis will be as overleaf. The variances 
are all made unity and the covariances converted into 
Correlations. The analysis into factors is a new one, not 
derived from the last table. 

4. Appearance of a new factor—The m 
change in this sample analysis, as compared with the 
Population analysis on page 296, is the appearance of a 
new “ factor ” h linking the directly selected tests, a factor 
which is clearly due entirely to that selection. What 
degree of reality ought to be attributed to it? Does it 
differ from the other factors really, or have they also been 
Produced by selection, even in the population, which is 
Only in its turn a sample chosen by natural selection from 
Past generations ? 

Otherwise the analysis is still into one common factor 
and specifies. The loadings of the common factor are 


ost noticeable 
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Analysis in the Sample 
1 2 S AROTAR h S as 


1 [1-00 -83| -46 38 -30 -24 -82 -45 -35 . 
2 | -881-00| -43 -35 -28 22 77 -45 . -46 


3 | -46 -43 1-00 -26 -21 -16 -56 . 6 > HERG 

4 | -38 -35| -261-00 -17 -13 -46 . r 5 5 AD os 

5 | -30 -28| -21 171-00 -11 -37 93 . 

6 | -24 -22| -16 -13 111-00 -29 96 
g’| -82 -77| -56 -46 -87 -291-00 . 

h | 45 45| . b 4 3 SELO T 

CA 35 ‘ . é ` 5 5 AOD oc 

S|. 46) . ° ° : 5 A o UA 

EAN ae - | 88 . č ; 6 5 s e ue 

AS 5 on MEE ag 5 9 5 3 ; a00: 

Sal - z : FOB s 100. 

Sj - . . . Ca i 1-00 


less than they were in the population, and this, as our table 
of variances and covariances shows, is due to a real 
diminution in the variance of the common factor. The 
new common factor g’ is a component of the old one. 

The loadings of sı and s, have also sunk, because they 
have been in part turned into a new common factor. The 
loadings of the other specifies have risen. But this is 


entirely because the variance of the tests has sunk due to 
the shrinkage in g, 


being added. 


All these considerations make it very doubtful indeed 
whether any factors, and any loadings of factors, have 
absolute meaning. They appear to be entirely dependent 
upon the population in which they are measured, and for 
their definition there would be required not only a given 
set of tests and a given technical procedure in 
also a given population of persons. 


i Professor Thurstone, however, in his new book Multiple 
Factor Analysis (1947) gives what he mildly calls “‘ a less 
pessimistic interpretation than Godfrey Thomson’s of the 
factorial results of selection.” 


5. Identity of simple structure factors after univariate 
selection.—In that book, Thurs 


tone discusses in Chapter 
XIX the effects of selection, and shows by examples that 


and is not due to any new specifics 


analysis, but 
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if a battery of tests yields simple structure with oblique 
factors (including, of course, the orthogonal case), then 
after univariate selection the same factors (though at new 
angles with one another) are identified by the new structure, 
which is still simple. 

If, for example, the battery which gives the correlations 
on our page 152, and yields Figure 26 on page 158, has the 
standard deviation of Test 2 reduced to one-half, then by 
the methods described on our pages 296-8 we can calculate 
that the matrix of correlations and communalities becomes : 


| 1 2 3 4 5 6 


1295 +802 -049 159 -188 -000 

— 044 -049 -555 -115 -804 -506 
TAO. -159 -115 -871 — -087 -000 
366 +183 -304 — -087 439 B22 

| 000 -000 -506 -000 +322 493 


| -589 -295 —-O4d —-140 366 -000 
| 
| 


The rank of this matrix is still 3 as it was before selection, 
and three centroid factors are found to have loadings— 


I II iil 
1 +409 -647 058 
2 -879 244 — 815 
3 -569 — “444 184 
4 160 — -271 — -522 
5 -585 174 -257 
6 -506 — -350 -887 


When these are “ extended ” in the manner of our page 157° 
“a a diagram like Figure 26 made, we obtain Figure 32. 
Fe Is still a triangle, and although its measurements are 
wee the same tests are found defining each side as 
ore. The corners of the triangle may, with Professor 
; hurstone, reasonably be claimed to represent the same 
actors as before selection, although their correlations have 
changed. 3 
uy plane of Figure 32 is not the same as the plane of 
igure 26, being at right angles to a different first centroid. 
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When adjustment is made for this, as Professor Thurstone 
has presumably done in his chapter (though, I protest, 
without sufficient explanation), then the directly selected 


Figure 32. 


test point has not moved, while the other points have 

moved radially away from or towards it. J 

If the above matrix of centroid loadings is postmulti- 

plied by the rotating matrix obtained from the diagram, 
viz.— 

“721 443 641 

— 499 — -201 744 

-480 — -874 — -190 


we obtain the new simple structure on the reference vectors, 


A B Cc 
u : “732 
2 -394 -484 
3 562 -180 
4 : -472 y 
5 “459 J “455 
6 “702 
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If this is compared with the table on page 154 it will be 
seen that the zeros are in the same places, although the 
non-zero entries have altered (except in Test 6, which was 
uncorrelated with the directly selected Test 2, and therefore 
1s unaffected in composition). 

If the correlations between the factors are calculated by 
the method of pages 181-2, factor A is found to be still 
uncorrelated with B and C, but these last two have a 
correlation coefficient of — -3: that is, they are no longer 
orthogonal but at an obtuse angle of about 1074°. 

6. Multivariate selection and simple structure.—But 
though Thurstone must, I think, be granted his claim that 
univariate selection will not destroy the identity of his 
oblique simple structure factors, but only change their 
intercorrelations, the situation would seem to be very 
different with multivariate selection. 

Multivariate selection is not the same thing as repeated 
Univariate selection. The latter will not change the rank 
of the correlation matrix with suitable communalities, nor 
will it change the position of zero loadings in simple struc- 
ture. Repeated univariate selection will, it is true, cause 
all the correlations to alter, but only indirectly and in such 
& way as to preserve rank, simple structure, and factor 
identity. 

But in multivariate selection it is envisaged that the 
correlation between two variables may itself be directly 
selected, and caused to have a value other than that which 
would naturally follow from the reduction of standard 
deviation in two selected variables. Selection for correla- 
tion is just as easily imagined as is selection for scatter. 
Indeed, in natural selection it is possibly even commoner. 

Once we select for the correlations, however, as well as 
for scatter, new “ factors ” emerge, old ones change. In 
this chapter we have supposed a small part Rpp of the whole 
correlation matrix to be changed to V,» and found that 
one new factor is created (page 300) or, indeed, two new 
oblique factors (page 298). We might have supposed R,, to 
be a larger portion of R: and there is nothing to prevent 
us supposing selection to go on for the whole of R, and 
writing down a brand-new table of coefficients whose 
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“factors ” would be quite different from those of the origi- 
nal table. In our example of page 152, for instance, 
where the three oblique “ factors ” coincided in direction 
with the communal parts of Tests 1, 4, and 6, there is 
nothing to prevent us from writing down, as having 
been produced by selection, a new set of correlation coeffici- 
ents whose analysis would identify the “ factors ” with the 
communal parts of Tests 2,3, and 5. In fact, all we would 
have to do would be to renumber the rows and columns on 
page 152. Such fundamental changes could be produced 
by selection: and perhaps they have been, for natural 
selection has had plenty of time at its disposal. 

Professor Thurstone (his page 458, footnote, in Multiple 
Factor Analysis) classes the new factors produced by 
selection as “incidental factors (which) can be classed 
with the residual factors, which reflect the conditions of 
particular experiments.” But we can hardly dismiss 
them thus easily if, as is conceivable, they have become 
the main or perhaps the only factors remaining, the others 
having disappeared ! 

It may be admitted at once, however, that the actual 
amount of selection from psychological experiment to 
psychological experiment is not likely to make such 
alarming changes in factors. For the use to which factors 
are likely to be put in our age, in our century or more, they 
are like to be independent enough of such selection as can 
go on in that time, and in that sense Professor Thurstone 
is justified in his thesis. Nor am I one to deny “ reality ” 
to any quality merely because it has been produced by 
selection, and may not abide for all time. 


PART VII 
THE NATURE OF FACTORS 


CHAPTER XX 
THE SAMPLING THEORY 


1. Two views. A hierarchical example as explained by one 
general factor —The advance of the science of factorial 
analysis of the mind to its present position has not taken 
place without controversy, and it is the purpose of the pre- 
sent chapter to give a preliminary description of some 
objections which have been frequently raised by the 
present writer (Thomson, 1916, 1919a, 1935b, etc.) which 
he still holds to. 

The contrast between the factorial point of view and 
Thomson’s sampling theory can be best seen by consider- 
Ing the explanation of the same set of correlation coefficients 
by both views. To simplify the argument we shall take 
m the first place a set of correlation coefficients whose 
tetrads are exactly zero, which can therefore be completely 
cates ? by a general factor g and specifics, as in this 

able : 


1 2 3 4 
1 : 746 -646 527 
2 746 š -577 ‘471 
3 -646 -577 . "408 
4 -527 -471 -408 : 


We can more exactly follow the argument if we employ 
the vulgar fractions of which these are the decimal 
equivalents, namely the following, each divided by 6: 


| 1 2 3 4 
2 4/20 iz Viz ve 
5 2 . 
i Y Va 4/6 ; 


In this form the tetrad-differences are all obviously zero 
y inspection. These correlations can therefore be ex- 
307 
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plained by one general factor, as in Figure 33, which gives 
them exactly. 


We have here a general factor of variance 30 which is 


the sole cause of the correlations, and specific factors of 


4. 
(36) 
Figuro 36, 


Figure 35, 


variances 6, 15, 30, and 60. The variances of the four 


“ tests ” are 36, 45, 60, and 90. The “ communalities ” 
and “ specificities ” are 


Test 1 2 3 4 Totals 
Communality Mego y ew wy 220, 2:8333 
86 45 60 90 180 
hee 6 15 30 60 300 
Specificity = = = = — = 1:667 
P 36 45 60 90 180 
Totals A 1 


1 1 1 4 
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These communalities can be calculated from the corre- 
lation coefficients, for it will be remembered (Chapter I, 
Section 4) that when tetrad-differences are exactly zero, 
each correlation coefficient can be expressed as the 
product of two correlation coefficients with g (two 
“saturations ”). Thus— 


Tyo = Tig 2g 
Tis = Yigg 
Tas = Ty dg 


Therefore— 
Tiaris _ (figos) (Tags) w 
Tog (T239) 
the square of the saturation of Test 1 with g. And when 
there is only one common factor, the square of its satura- 
tion is the communality. 

The quantity 773/723, therefore, means, on this theory 
of one common factor, the communality, or square of the 
saturation with g, of the first test. Its value in our 
example is 30/36, or five-sixths. 

2. The alternative explanation. The sampling theory. 
—The alternative theory to explain the zero tetrad- 
differences is that each test calls upon a sample of the bonds 
which the mind can form, and that some of these bonds are 
common to two tests and cause their correlation. In the 
present instance we have arranged this artificial example 
so that the tests can be looked upon as samples of a very 
simple mind, which can form in all 108 bonds (or some 
multiple of 108).* The first test uses five-sixths of these 
(or 90), the second test four-sixths (or 72), the third three- 
sixths (54), and the fourth two-sixths (or 36). These 
fractions are the same in value as the communalities of 
the former theory. Each of them may be called the 
“ richness ” of the test. Thus Test 1 is most rich, and 
draws upon five-sixths of the whole mind. The fractions 
Tyfiz/Tjz Which in the former theory were “ communali- 
ties,” are in the sampling theory “ coefficients of rich- 


2 
Tig 


* There is nothing mysterious about the number 108. It is 
chosen merely because it leads to no fractions in the diagram. 
Any large number would do. 
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ness.” They formerly indicated the fraction of each test’s 
variance supplied by g; they indicate here the fraction 
which each test forms of the whole “ mind ” (but see later, 
concerning “ sub-pools ”). 

Now, if our four tests use respectively 90, 72, 54, and 36 
of the available bonds of the mind, as indicated in Figure 
34, then there may be almost any kind of overlap between 
two of the tests. Any of the cells of the diagram may have 
contents, instead of all being empty except for g and the 
specifics. If we know nothing more about the tests except 
the fractions we have called their “ richnesses,” we cannot 
tell with certainty what the contents of each cell will be ; 
but we can calculate what the most probable contents will 
be. If the first test uses five-sixths and the second test 
four-sixths of the mind’s bonds, it is most probable that 
there will be a number of bonds common to both tests 


4 
X g 20/36ths of the total number. That is, 


the four cells marked a, b, c, d in the diagram, the cells 
common to Tests 1 and 2, will most likely contain— 


equal to 2 
al to — 
q 6 


a 108-= 60'bond 
36 = onds 


between them. B 
can find the most 
the number of bo 
probably— 

5 4 8 2 

é Sa Sg Kg X 108 = 10 bonds. 


In this way we reach the most probable pattern of 
overlap of the four tests shown in Figure 35. And this 
diagram gives exactly the same correlations as did Figure 33. 


Let us try, for example, the value of 73 in each diagram. 
In Figure 33 we had— 


30 v12 
V(45\ x 60) ~ e a7 
In Figure 35 the same correlation is— 
—20+10+449 4/12 

V(72 x e 


y an extension of the same principle we 
probable number in each cell. Thus ce, 
nds used in all four of the tests, is most 


Tog = 


Teg 
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This form of overlap, therefore, will give zero tetrad- 
differences, just as the theory of one general factor did. 
More exactly, this sampling theory gives zero tetrad- 
differences as the most probable (though not the certain) 
connexion to be found between correlation coefficients 
(Thomson, 1919a) if the sampling of causes is random. 

If we let p,, Pa Ps and p, represent fractions which the 
four tests form of the whole pool of N bonds of the mind, 
then the number common to the first two tests will most 
probably be p,p,N, and the correlation between the tests 


We therefore have, in any tetrad, quantities like the 
following : 
| 3 4 


l VEPs VPiPs 
2 VPaPs VP2Ps 


and the tetrad-difference is, most probably (Thomson, 


19274, 253)— 
VPPsP2P1 — VP1PsP2P3 = 9 


This may be expressed by saying that the laws of proba- 
bility alone will cause a tendency to zero tetrad-differences 
among correlation coefficients. In another form this 
statement can be worded thus: The laws of probability or 
chance cause any matrix of correlation coefficients to tend 
to have rank 1, or at least to tend to have a low rank (where 
by rank we mean the maximum order among those non- 
vanishing minors which avoid the principal diagonal 
elements). 

It is, in the opinion of the present writer, this fact—a 
result of the laws of chance and not of any psychological 
laws—which has made conceivable the analysis of mental 
abilities into a few common factors (if not into one only, 
as Spearman hoped) and specifics. Because of the laws 
of chance the mind works as if it were composed of these 
hypothetical factors g, v, n, etc., and a number of specific 
factors. The causes may be “anarchic,” meaning that 
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they are numerous and unconnected, yet the result is 
““monarchie,” or at least “ oligarchic,” in the sense that 
it may be so described—provided always that large specific 
factors are allowed. 

3. Specific factors maximized.—The specific factors play, 
in the usual methods of factorization, an important rôle, 
and our present example can be used to illustrate the fact, 
which is not usually realized, that all these methods 
maximize the specifics (Thomson, 1938c) by their insistence 
on minimizing the number of common factors. In Figure 
38, of the whole variance of 4, the specific factors contribute 
1-667, or 41-7 per cent. In Figure 35, they contribute 
only— 

2 250 z 
a á t 5a + a L080 ~ 2315, or 5-8 per cent. 


Apart from certain trivial exceptions which do not occur 
in practice, it is generally true that minimizing the number 
of common factors maximizes the variance of the specifics. 
Innumerable other equivalent analyses of the above cor- 
relations can be made, but they all give a variance to 
the specifies which is less than 1-667, Here, for example, 
in Figure 36 (page 308), is an analysis which has no general 


factor but six other common factors, and which gives a 
total specific variance of — 


15 6 3 830 
a0 + 72 + 54 AAOS 1,080 ~ "83056, or 7-6 per cent. 
Now, specific factors are undoubtedly a difficulty in any 
analysis, and to have the specific factors made as large and 
important as possible is a heavy price to pay for having as 
few common factors as possible. 
That specific factors are a difficulty seems to be recog- 
nized by Thurstone, «“ The specific variance of a test,” he 
writes (Vectors, 63), “ should be re 


> 


and he looks forward to splitting a specific factor up into 
group factors by brigading the 

companion tests in a 
the dissolution of Specifies j 


to happen if each analysis ; 
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making the specific variances as large as possible. We 
must, however, leave this point here, to return to it later. 

4. Sub-pools of the mind.—A difficulty which will occur 
to the reader in connexion with the sampling theory is that, 
when the correlation between two tests is large, it seems to 
imply that each needs nearly the whole mind to perform 
it (Spearman, 1928, 257). -In our example the correlation 
between Tests 1 and 2 was -746, a correlation not infre- 
quently reached between actual tests. It is, for instance, 
almost exactly the correlation reported by Alexander 
between the Stanford-Binet test and the Otis Self- 
administering test (Alexander, 1935, Table XVI). Does 
this, then, mean that each of these tests requires the 
activity of about four-sixths or five-sixths of all the 
“bonds” of the brain? Not necessarily, even on the 
sampling theory. These two tests are not so very unlike 
One another, and may fairly be described as sampling the 
Same region of the mind rather than the whole mind, so 
that they may well include a rather large proportion of the 
bonds found in that region. They may be drawn, that is, 
from a sub-pool of the mind’s bonds rather than from the 
whole pool (Thomson, 1935b, 91; Bartlett, 1987a, 102). 
Nor need the phrase “region of the mind” necessarily 
mean a topographical region, a part of the mind in the 
Same sense as Yorkshire is part of England. It may mean 
Something, by analogy, more like the lowlands of England, 
all the land easily accessible to everybody, lying below, 
Say, the 300-foot contour line. What the “ bonds ” of the 
mind are, we do not know. But they are fairly certainly 
associated with the neurones or nerve cells of our brains, 
of which there are probably round about ten thousand 
million in each normal brain. Thinking is accompanied 
by the excitation of these neurones in patterns. The 
Simplest patterns are instinctive, more complex ones 
acquired. Intelligence is possibly associated with the 
number and complexity of the patterns which the brain 
can (or could) make. A “region of the mind” in the 
above paragraph may be the domain of patterns below a 
Certain complexity, as the lowlands of England are below 
4 certain contour line. Intelligence tests do not call upon 
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brain patterns of a high degree of complexity, for these 
are always associated with acquired material and with the 
educational environment, and intelligence tests wish to 
avoid testing acquirement. It is not difficult to imagine 
that the items of the Stanford-Binet test call into some 
sort of activity nearly all the neurones of the brain, though 
they need not thereby be calling upon all the patterns 
which those neurones can form. When a teacher is 
demonstrating to an advanced class that “a quadratic 
form of rank 2 is identically equal to the product of 
two linear forms,” he is using patterns of a complexity far 
greater than any used in answering the Binct-Simon items. 
But the neurones which form these patterns may not be 
more numerous, Those complicated patterns, however, 
are forbidden to the intelligence tester, for a very intelligent 
man may not have the ghost of an idea what a “ quadratic 
form” is. Within the limits of the comparatively simple 
patterns of the brain which they evoke, it seems very 
Possible that the two tests in question call upon a large 
proportion of these, and have a large number in common. 

s has been indicated, the author is of opinion that 
the way in which they magnify specific factors is the 
theories of a few common factors. That 
owever, that a description of a matrix of 
erms of these theories is inexact. Men 
cannes parton mental tasks as if they were doing 
AAA ns of a comparatively small number of group 
> ctors of wide extent, and an enormous number of specific 
ae a very nartow range but of great importance each 
E meade Whether a description of their powers n 
e x a common factors only is a good description 
e a meang on what purpose we want the 
to give vocational or of The Practical purpose psmiualy 
a A uca tional advice to the man or to 
E NE mie ers, and factors, though they cannot 
estimates, may ey blur the accuracy of vocational 
they would ane mee facilitate them where otherwise 
trade where barter is F impossible, as money facilitates 
Impossible. 


As a theoretic i 
al account of each man’s mind, however; 


undoubtedly do 
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the theories which use the smallest number of common 
factors seem to have drawbacks. They can give an exact 
reproduction of the correlation coefficients. But, because 
of their large specific factors, they do not enable us to give 
an exact reproduction of each man’s scores in the original 
tests, so that much information is being lost by their 
use. 

It will be seen from considerations such as these that 
alternative analyses of a matrix of correlations, even 
although they may each reproduce the correlation coeffi- 
cients exactly, may not be equally acceptable on other 
grounds, The sampling theory, and the single general 
factor theory, can both describe exactly a hierarchical set 
of correlation coefficients, and they both give an explana- 
tion of why approximately hierarchical sets are found in 
practice. In a mathematical sense, they are alternatives. 
But we cannot keep both as realities, though we may 
employ either mathematically. 

5. The inequality of men.—Professor Spearman opposed 
the sampling theory chiefly on the ground that it would 
make all correlations equal (and zero), and involve the 
further consequence that all men are equal in their average 
attainments (Abilities, 96), if the number of elementary 
bonds is large, as the sampling theory requires. Both 
these objections, however, arise from a misunderstanding 
of the sampling theory, in which a sample means “ some 
but not all ” of the elementary bonds (Thomson, 1935b, 
72, 76). As has been explained, tests can differ, on this 
theory, in their richness or complexity, and less rich tests 
will tend to have low, more complex tests will tend to have 
high correlations, at any rate if the “ bonds ” tend to be 
all-or-none in their nature, as the action of neurones 1s 
known to be. And as for the assertion that the theory 
Makes all men equal, there is no basis whatever for the 
Suggestion that it assumes every man to have an equal 
Chance of possessing every element or bond. On the con- 
trary, the sampling theory would consider men also to be 
Samples, each man possessing some, but not all, both of the 
Inherited and the acquired neural bonds which are the 
physical side of thought. Like the tests, some men are 
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rich, others poor, in these bonds. | Some are richly es 
by heredity, some by opportunity and education; so on 
by both, some by neither. The idea that men are sap > 
of all that might be, and that any task samples the pow 3N 
which an individual man possesses, does not for a momen 
carry with it the consequences asserted of equal correlations 
and a humdrum mediocrity among human kind. A 

6. Negative and positive correlations. *—The great major- 
ity of correlation coefficients reported in both pea 
and psychological work are positive. This almost certainly 
Tepresents an actual fact, namely that desirable gua ties 
in mankind tend to be positively correlated ; for though 
reported correlations may be selected by the ace 
prejudices of experimenters, who are usually on the oo - 
out for things which correlate positively, yet as those who 
have tried know, it is really very difficult to discover 
negative correlations between mental tests. Besides, even 
in imagination we cannot make a race of beings with 
predominantly negative correlations. A number of lists 
of the same persons in order of merit can be all very like 
one another, can indeed all be identical, but they cannot 
all be the Opposite of one another. If Lists a and b are 
the inverse of one another, List c, if it is negatively 
correlated with a, will be positively correlated with b. 
Among a number n of variates, it is logically possible to 
have a square table of correlation coefficients each equal 
to unity; that is, an average correlation of unity. But 
the farthest the average correlation can be pushed in the 
negative direction is — l/(n — 1). That is, if n is large, 
the average correlation can range from + 1 to only very 
little below zero. Even Mother Nature, then, by natural 
er means, could not endow man. 
ed both many and large negative 
re many, they would have to be 
re large, they would have to be 


correlations. If they we 


very small; if they we 
very few, 


Natural selection has probably tended, on the whole, to 
* This section refers to c 


frequency of negative correl; 
discussed in Chapter XVI, 


orrelations between tests. The greater 


ations between persons has already been 
Section 8. 
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favour positive correlations within the species.* In the case 
of some physical organs it is obvious that a high positive 
correlation is essential to survival value—for example, 
between right and left leg, or between legs and arms. In 
these cases of actual paired organs, however, it is doubtless 
more than a mere figure of speech to speak of a common 
factor as the cause. Between organs not simply related 
to one another, as say eyes and nose, natural selection, 
if it tended towards negative correlation, would probably 
split the genus or species into two, one relying mainly on 
eyesight, the other mainly on smell. Within the one 
Species, since it is mathematically easier to make positive 
than negative correlations, it seems likely that the former 
would largely predominate. To say that this was due to 
a general factor would be to hypostatize a very complex 
and abstract cause. To use a general factor in giving a 
description of these variates is legitimate enough, but is, 
of Course, nothing more than another way of saying that 
the correlations are mainly positive—if, as is the case, most 


* An important kind of natural selection is the selection of one sex 
by the other in mating. Dr. Bronson Price (1936) has pointed out 
that Positive cross-correlation in parents will produce positive correla- 
tion in the offspring. Price further shows that this positive cross- 
Correlation in the parents will result if the mating is highly homo- 
8amous for total or average goodness in the traits, a conclusion which, 
it may be remarked here, can be easily seen by using the pooling 
Square described in our Chapter XIV. Price concludes: “ The 
Intercorrelations which g has been presumed to illumine are seen 
Primarily as consequences of the ‘social and therefore marital 
importance which has attached to the abilities concerned.” Price 
ìn his argument makes use of formule from Sewall Wright (1921). 
M.S. Bartlett, in a note on Price’s paper (Bartlett, 1987b), develops 
his argument more generally, also using Wright’s formule, and says Š 
* Price contrasts the idea of elementary genetic components with 
factor theories. . . . It should, however, be pointed out that a 
Statistical interpretation of such current theories can be and has been 
advocated. Thomson has, for example, shown . . .”, and here 
follows a brief outline of the sampling theory. “On the basis of 
Thomson’s theory,” Bartlett adds, “ I have pointed out (Bartlett, 
1937) that general and specific abilities may naturally be defined 
in terms of these components, and that while some statistical 
interpretation of these major factors seems almost inevitable, this 
may not in itself render their conception invalid or useless. 
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people mean by a general factor one which helps in every 
case, not an interference factor which sometimes helps and 
sometimes hinders. 

7. Low reduced rank.—It is, however, on the tendency 
to a low reduced rank in matrices of mental correlations 
that the theory of factors is mainly built. It has very 
much impressed people to find that mental correlations 
can be so closely imitated by a fairly small number of 
common factors. Ignoring the host of large specific factors 
to which this view commits them, they have concluded 
that the agreement was so remarkable that there must be 
something in it. There is; but it is almost the opposite of 
what they think. Instead of showing that the mind has 
a definite structure, being composed of a few factors which 
work through innumerable specific machines, the low rank 
shows that the mind has hardly any structure. If the 
early belief that the reduced rank was in all cases one had 
been confirmed, that would indeed have shown that the 
mind had no structure at all but was completely undiffer- 
entiated. It is the departures from rank 1 which indicate 
structure, and it is a significant fact that a general tendency 
is noticeable in experimental reports to the effect that 
batteries do not permit of being explained by as small a 
number of factors in adults as in children, probably because 
in adults education and vocation have imposed a structure 
on the mind which is absent in the young. 

By saying that the mind has little structure, nothing 
derogatory ismeant. The mind of man, and his brain, too, 
are marvellous and wonderful. All that is’ meant by the 
absence of structure is the absence of any fixed or strong 
linkages among the elements (if the word may for a moment 
“be used without implications) of the mind, so that any 
sample whatever of those elements or components can be 
assembled in the activity called for by a “ test.” 

‘ Not that there is any necessity to suppose that the mind 
1S composed of separate and atomic elements. It is pos- 
sibly a continuum, its elements if any being more like 

a dissolved crystalline substance than 
- The only reason for using the word 
“elements ” is that it is difficult, if not impossible, to speak 
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of the different parts of the mind without assuming some 
“ items ” in terms of which to think. For concreteness it 
is convenient to identify the elements, on the mental side, 
with something of the nature of Thorndike’s “ bonds,” 
and on the bodily side with neurone ares ; in the remainder 
of this chapter the word “bonds” will be used. But 
there is no necessity beyond that of convenience and 
vividness in this. The “bonds” spoken of may be 
identified by different readers with different entities. All 
a “bond ” means, is some very simple aspect of the causal 
background. Some of them may be inherited, some may 
be due to education. There is no implication that the 
combined action of a number of them is the mere sum of 
their separate actions. There is no commitment to 
“mental atomism.” 

If, now, we have a causal background comprising in- 
numerable bonds, and if any measurement we make can 
be influenced by any sample of that background, one 
measurement by this sample and another by that, all 
samples being possible; and if we choose a number of 
different measurements and find their intercorrelations, 
the matrix of these intercorrelations will tend to be 
hierarchical, or at least tend to have a low reduced rank. 
This has nothing to do with the mind: it is simply a 
mathematical necessity, whatever the material used to 
illustrate it. : , 

8. A mind with only six bonds —We shall illustrate this 
fact first by imagining a “ mind ” which can form only 
Giba U bonds,” which mind we submit to four ; tests 
Which are of different degrees of richness, the one requiring 
the joint action of five bonds, the others of four, three, and 
two respectively (Thomson, 1927b). These four tests will 
(when we give them to a number of such minds) yield 
correlations with one another. For we shall suppose the 
different minds not all to be able to form all six of the 
Possible bonds, some individuals possessing all six, others 
Possessin numbers. 

We Rees ea a the richness of each test, but 
have not said which bonds form each ability. There may, 
therefore, be different degrees of overlap between them, 
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though some will be more frequent than others if we form 
all the possible sets of four tests which are of richness five, 
four, three, and two. If we call the bonds a, b, c, d, e, 


and f, then one possible pattern of overlap would be the 
following : 


Test | Bonds 
1 | a b c d € 
2 ; b c d e A 
3 : z 3 d e if 
4 5 5 c d : 


If we for further simplicity suppose these bonds to be 
equally important, and use the formula— 


Correlation Overlap. ——— ~ 

geometrical mean of the two totals 

we can calculate the correlations w 
would give, namely : 


hich these four tests 


1 2 3 4 
1 aE A ae 
v20 4/15 4/10 
aE al 
20 VJ/12 8 
Bis | eee SA 1 
V15 4/12 V6 
m | lke AE a 
Vlo V8 y6 


and we notice that in 
tetrad-differences are ze 
four tests at random ( 
these degrees of richne 


this particular pattern all three 
ro. However, if we picked our 
taking care only that they were of 
ss) we would not always or often get 
the above pattern : in point of fact, we would get it only 
12 times in 450, Nevertheless, it is one of the most prob- 
able patterns. In all, 78 different patterns of the bonds 
are possible—always adhering to our five, four, three, and 
two—the probabilit 


y of each pattern ranging from 12 in 
450 down to 1 in 450. 
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It is possible to calculate the tetrad-differences for each 
one of the 78 possible patterns of overlap which can occur. 
When we then multiply each pattern by the expected fre- 
quency of its occurrence in 450 random choices of the four 
tests, we get 450 values for each tetrad-difference, distri- 
buted as follows : 

Values of | : ge OF 


Fx 120] F, | F, | F, 


8 i 2 
Wf 4 | 0 
6 | 8 | 14 
5 GP) eee 9G 
4 27 84 | 28 
3 6 12 | 30 
2 75 72 | 48 

1 61 66 | 72 
0 99 54 | 81 
ye 56 78 | 36 
—2 67 | 42 42 
aaa 16 30 60 
paeh 30 36 18 
=p 0 0 0 
B: 4 12 18 

— | — 
450 |450 |450 


_ Although the distribution of each F about zero is slightly 
irregular, the average value of each F is exactly zero. For 

1 the variance is— 

2,164 

= 720 x 450 

We see, then, that in this universe of very primitive- 
minded men, whose brains can form only six bonds, four 
tests which demanded respectively five, four, three, and 
two bonds would give tetrad-differences whose expected 
value would be zero, the values actually found being 


grouped around zero with a certain variance. one is no 
Particular mystery about the four « richnesses ” five, four, 
three, and two, by the way. We might have takeojany 


F.A.—l1 


= -040 


2 


g? 
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“ richnesses ” and got a similar result. If there are no 
linkages among the bonds, the most probable value of a 
tetrad-difference will always be zero; and if all possible 
combinations of the bonds are taken, the average of all the 
tetrad-differences will be zero. With only six bonds in the 
“mind,” however, the scatter on both sides of zero will be 
considerable, as the above value of the standard deviation 
of F, shows, viz.— 
c = 4/:040 = -20 


9. A mind with twelve bonds.—But as the number of 
bonds in the mind increases, the tetrad-differences crowd 
closer and closer to zero. Let us, for example, suppose 
exactly the same experiment as above conducted in a 
universe of men whose minds could form twelve bonds 
(instead of six), the four tests requiring ten, eight, six, and 
four of these (instead of five, four, three, and two) (Thom- 
son, 1927b). This increase in complexity enormously 
increases the work of calculating all the possible patterns 
of overlap, and the frequency of each, There are now 
1,257 different Square tables of correlation coefficients and 
still more patterns of overlap, some of which, however, 
give the same correlations, When each possibility is taken 
in its proper relative frequency (ranging from once to 
11,520 times) there are no fewer than 1,078,110 instances 
required to represent the distribution. They have, 


Fr. cuineless, all been calculated, and the distribation of 
F, was as follows : 


VIO Vi920 | Vi920 | | -V1930 

| Fre. = 0 | Hee) | vV z 0 AA | Ka Freg. 
20 225 | 7 | 17,760 — 8 |81,482 | — 73 624 
18 | 1,800] 6 | ragg | _ 4 |72,676 | — 14 | 3,792 
16) 1755) EA | _ 5 |53,808 | — 15 | 4,144 
15 | 46001 4 | 52085 Z 6 | 49,328 | — 16 | 3,970 
14 | 3,840] 3 Jro1¢o8| _ 7 |21,240 | —18 | 112 
12 | 19,610 | 2 | 42,9841 — 9 Jas] 19 | 456 
11 |10,682 || 4 28,096 | — 9 | 5896 || _ 20 584 
10 | 8,360] 0 129 699 | _ 10 |29184 | — 24 re 
oe a a ko ae 8,960 | 

8 [87,735 |—2 | 81208 | — 12 | 15,672 | 


Total 1,078,110 
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This table again gives an average value of F, exactly 
equal to zero. But the separate values of the tetrad- 
difference are grouped more closely round zero than 
before, with a variance now given by— 

___ 87,166,400 
~~ 1,920 x 1,078,110 

This is rather less than half the previous variance. 
Doubling the number of bonds in the imagined mind has 
halved the variance of the tetrad-differences. If we were 
to increase the number of potential bonds supposed to 
exist in the mind to anything like what must be its true 
figure, we would clearly reach a point where the tetrad- 
differences would be grouped round zero very closely 
indeed, 

The principle illustrated by the above concrete example 
can be examined by general algebraic means, and the above 
Suggested conclusion fully confirmed (Mackie, 1928a, 
1929). It is found that the variance of the tetrad-differ- 
ences sinks in proportion to 1/(N — 1), where N is the 
number of bonds, when N becomes large, and the above 
mee agrees with this even for such small N’s as 6 and 

: for— 


= 0-018 


2 


G' 


6— L y 040 = -018 as found. 
12— 1 
In this mathematical treatment, bonds have been spoken 
of as though they were separate atoms of the mind, and, 
Moreover, were all equally important. It is probably 
quite unnecessary to make the former assumption, which 
may or may not agree with the actual facts of the mind, 
or of the brain. Suitable mathematical treatment could 
Probably be devised to examine the case where the causal 
background is, as it were, a continuum, different proportions 
of it forming tests of different degrees of richness. And as 
for the second assumption, it is in all likelihood merely 
formal. Let the continuum be divided into parts of equal 
Importance, and then the number of these increased and 
their extent reduced, keeping their importance equal. 
What is necessary, to give the result that zero tetrads are 
So highly probable, is that it be possible to take our tests 


` 
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with equal ease from any part of the causal background ; that 
there be no linkages among the bonds which will disturb the 


in number. A sample of tests is taken. If this sample 
is large and random, then there should, in a mind without 


most duplicates (which 
? in a random sample), 
y free to use its bonds 
mparatively unlinked. 
each ability is com- 


some approach to “ all- 
that is, 


: tends either not to come 
into the 


o with its full force. This 


what is meant by the rank 


í S more and more of the possible corre- 
lations are taken. When the rank is 1 the tetrad- 


differences are zero. But clearly, the reader may say, 
taking more and More samples of the bonds to form more 
and more tests will not change in any way the pre-existing 
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tetrad-differences, will not make them zero if they are not 
zero to start with. That is perfectly true; but that is not 
what is meant. As more and more tests are formed by 
samples of the bonds, the number of zero and very small 
tetrads will increase and swamp the large tetrads. The 
sampling theory does not say that all tetrads will be 
exactly zero, or the rank exactly 1. It says that the 
tetrads will be distributed about zero (not because each 
is taken both plus and minus, but when all are given their 
sign by. the same rule) with a scatter which can be reduced 
without limit, in the sense that with more bonds the pro- 
Portion of large tetrads becomes smaller and smaller ; 
always provided all possible samples are taken, i.e. that 
the family of correlation coefficients is complete. 

With a finite number of tests this, of course, is not the 
case; but if the tests are a random sample of all possible 
tests, there will again be the approach to zero tetrads. 
The same will be true if the tests are sampling not the whole 
mind, but some portion of it, some sub-pool of our mind’s 
abilities. If we stray from this pool and fish in other 
Waters, we shall break the hierarchy ; but if we sampled 
the whole pool of a mind, we should again find the tendency 
to hierarchical order. If the mind is organized into sub- 
Pools (such as the verbal sub-pool, say), then we shall be 
liable to fish in two or three of them, and get a rank of 
2 or 3 in our matrix, i.e. get two or three common factors, 
in the language of the other theory. 

10. Contrast with physical measurements.—The tendency 
for tetrad-differences to be closely grouped around zero 
appears to be stronger in mental measurements than else- 
where ; stronger, for example, than in physical measure- 
ments although it is found there too. i 

In physical measurements we do not measure a person’s 
body just from anywhere to anywhere. We observe organs 
and measure them—leg, cranium, chest girth, etc. The 
variates are not a random sample. In other words, the 
physical body has an obvious structure which guides our 
Measurements, and the tendency to a low rank among the 
Correlation coefficient, although present, is less than among 
Mental measurements. The tendency to zero tetrad- 
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differences in the mind is due to the fact that the mind 
has, comparatively speaking, no organs. We can, and do, 
measure it almost from anywhere to anywhere. No test 
measures a leg or an arm of the mind; every test calls 
upon a group of the mind’s bonds which intermingles in 
most complicated ways with the groups needed for other 
tests, without being a set pattern immutably linked into 
an organ. Of all the conceivable combinations of the 
bonds of the mind we can, without great difficulty, take a 
random sample, whereas in physical measurements we take 
only the sample forced on us by the organs of the body. 
Being free to measure the mind almost from anywhere to 
anywhere, we can get a set of measurements which show 
“hierarchical order ” without overgreat trouble. We can 
do so because the mind is so comparatively structureless, 
Mental measurements tend to show hierarchical order, and 
to be susceptible of mathematical description in terms of 
and innumerable Specifics, not 
because there are specific neural machines through which 
its energy must show itself, but just exactly because there 
are no fixed neural machines. The mind is capable of 
expressing itself in the most plastic and Protean way, 
especially before education, language, the subjects of 
the school curriculum, the occupation, and the political 
beliefs of adult life have imposed a habitual structure on 
it. It is not without significance that the “ factor ” most 
widely recognized after Spearman’s g is the verbal factor v, 
the mother-tongue being, as it were, the physical body of 
the mind, its acquired structure, 

11. Absolute vari 
that on the sampli 
ally have different 


p richer ” tests having a 
wider scatter. 


This seems only natural. It is customary, 
ns, to reduce all scores 
asure, thereby equal- 
i This seems inevitable, for there 
Is no means of Comparing the scatter of marks in two 
different tests. But it does not follow that the scatter 
would be really the same if some means of comparison 
were available. When the same test is given to two 


izing their variance. 
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different groups we have no hesitation in ascribing a wider 
variance to the one or the other group, and it seems con- 
ceivable that a similar distinction might mentally be made 
between the scores made by one group in two different 
tests. The writer is completely in accord with M. S. Bart- 
lett when he says (Bartlett, 1935, 205): “I think many 
people would agree . . . that the variation in mathematical 
ability displayed even in a selected group such as Cam- 
bridge Tripos candidates cannot be altogether put down 
to the method of marking adopted by the examiners.” 
We may put these mathematics marks into standard 
measure, and we may put the marks scored by the same 
group in, say, a form-board test, also into standard measure. 
But that does not imply that at bottom the two variances 
are equal, if only we had some rigorous way of comparing 
them. Our common sense tells us plainly that they are 
not equal in the absolute sense, though for many purposes 
their difference is irrelevant. It seems to be no defect, 
then, but rather a good quality, of the sampling theory 
to involve different absolute variances. 

12. A distinction between g and other common factors.— 
The writer is inclined to make a distinction in interpretation 
between the Spearman general factor g and the various 
other common factors, mostly if not all of less extent than 
g, which have been suggested. When properly measured 
by a wide and varied hierarchical battery, g appears to him 
to be an index of the span of the whole mind, other common 
factors to measure only sub-pools, linkages among bonds. 
The former measures the whole number of bonds; the 
latter indicate the degree of structure among them. 

Some of this “ structure ” is no doubt innate ; but more 
of it is probably due to environment and education and 
life. Its expression in terms of separate uncorrelated 
factors suggests what is almost certainly not the case, that 
the “sub-pools ” are separate from one another. The 
actual organization is likely to be much more complicated 
than that, and its categories to be interlaced and inter- 
Woven, like the relationships of men in a community, 
plumbers and Methodists, blonds, bachelors, smokers, 
Conservatives, illiterates, native-born, criminals, and 


€ 
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school-teachers, an organization into classes which cut 
across one another right and left. 

Further, it is improbable that the organization of each 
mind is the same. The phrase “ factors of the mind ” 
suggests too strongly that this is so, and that minds differ 
only in the amount of each f. 


actor they possess. It is more 
than likely that different minds perform any task or test by 
different means, and ind 


eed that the same mind does so at 
different times. 


Yet with all the dangers 
it, it is probable that the fa 


g natural desire in mankind 
nd to name, forces and powers 
at is observed, nor ca 


CHAPTER XXI 
SOME FUNDAMENTAL QUESTIONS 


Ir seems advisable to conclude with a brief discussion of 
some of the fundamental theoretical questions needing an 
answer. Among these are the following, of which (1) 
and (8) are rather liable to be forgotten by those actually 
engaged in making factorial analyses : 

(1) What metric or system of units is to be used in 
factorial analysis ? 

(2) On what principle are we to decide where to stop the 
rotation of our factor-axes or how to choose them so that 
rotation is unnecessary ? 

(3) Is the principle of minimizing the number of 
common factors, i.e. of analysing only the communal 
variance, to be retained ? 

(4) Are oblique, i.e. correlated factors to be permitted ? 

1. Metric—Most of the work done in factorial analysis 
has assumed the scores of the tests to be standardized ; 
that is to say, in each test the unit of measure has been 
the actual standard deviation found in the distribution. 
This is in a sense‘a confession of ignorance. The accidental 
Standard deviation which happens to result from the par- 
ticular form of scoring used in a test means, of course, 
nothing more. Yet there is undoubtedly something to be 
Said for the probability of real differences of standard 
deviation existing between tests (see Chapter XX, 
Section 11). In that case, if we knew these real standard 
deviations, we would use variances and covariances and 
analyse them, not correlations (compare Hotelling, 1933, 
421-2 and 509-10). 

Burt has urged the use of variances and covariances, 
which are indeed necessary to him to enable his relation 

etween trait factors and person factors to hold (see Chap- 
ter XVII, page 264). But the variances and covariances 
he actually uses are simply the arbitrary ones which arise 
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from the raw scores, and depend entirely upon the scoring 
system used in each test. It would seem necessary to 
have some system of rational, not arbitrary, units. 
Hotelling has already suggested one such, based upon 
the idea of the principal components of all possible tests, 
but it would seem to be unattainable in practice (Hotel- 
ling, 1933, 510). Another can be based on the ideas of the 
sampling theory and has already been foreshadowed in 
the previous chapter. Tests quite naturally have different 
variances on that theory, since they comprise larger or 
smaller samples of the “ bonds ” of the mind (see Thomson, 
1935), 87). In a hierarchical battery these natural 
variances are measured by the “ coefficient of richness.” 
The “richness ” of Test W is given by 
Ty’ 
the same quantity as the square of Spearman’s “ 
tion with g.” It is, on the sampling theory, 
which the test forms of the pool of bon 
sampled, and is the natural vari 
son with other tests from th 


with g” of Spearman’s theory is the “ natural standard 
deviation ” of the sampling theory. Even in a battery 


which is not hierarchical, the formula (Chapter III, 
Section 5, page 43)— 


satura- 
the fraction 
ds which is being 
ance of the test in compari- 
at pool. The “ saturation 


At — a’ 
T — 2A 
gh estimate of the natural standard deviation 


The general principle is that tests which 
total correlation have the largest natural 


will give a rou 
of each test, 
show the most 
variance, 

2. Rotation Our views on the rotation of factors will 
depend on what we want them to do. Burt looks upon 


them as merely a convenient form of classification and is 
content to take the princi 


» R any rotation. He “ takes 
embers > either by calculation or 
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ee aS of tests an average score equal to the 
a average: each of the tests also having the same 
on pa ay ery other test in the battery over this sub- 
EA R l ersons (Burt, 198384). l He concentrates attention 
both ai ining factors, which are “ bipolar,” having 
Loe ae ws i. negative weights in the tests: When, 
vert 4a x referred to, he is analysing temperaments, 
ie as with common names for emotional charac- 
ee er ae: names too are usually bipolar, as 
saith eh coms y, extravagant-stingy, extravert-introvert, 
yep on the other hand, emphatically insists on 
foveal d for rotation if the factors are to have psycho- 
ae pee (Thurstone, 1938, 90). The centroid 
ee ota noe averages of the tests which happen to 
rae a S tery, and change as tests are added or taken 
batie. See he wants factors which are invariant from 
heme 3o battery. I think he would put invariance 
fakes ocan oei meaning, and say that if a certain 
tie seeps turning up in battery after battery we must 
as oe what its psychological meaning is. His 
Se peas backed up by a great deal of experimental 
ede a pioneering and exploratory nature, is that his 
3 ee e of rotating to “simple structure ” gives us also 
Į ye hologically meaningful and invariant factors} 
i m problems of rotation and metric are not unconnected, 
TDS piece of evidence in favour of rotating to simple 
re is that the latter is independent of the units 


used i , ; $ 
ma a in the tests. If instead of analysing correlations we 
nalyse covariances, with whatever standard deviations 
we get a centroid analysis 


We care to assign to the tests, 
he different from the centroid analysis of correlations. 
‘als: a we rotate each to simple structure the tables are 
en Leal except, of course, that in the covariance structure 
test} row is multiplied by the standard deviation of the 
For example, 
penn 2 (page 152) 
ceviations of 1, 2, 3, 4, 5, and 6 to them, 
orrelations and communalities by cova 


if we take the six tests of Chapter XI 


and ascribe arbitrary standard 
we can replace the 


riances and vari- 
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ance-communalities, and perform a centroid analysis. 
Since we know the proper communalities* it comes out 
exactly in three factors with no residues, and gives the 
centroid structure : 


| I IT III 
1 “B72 -567 462 
2 | -948 1:278 — -060 
3 1-969 —1-016 — -387 
4 1-002 1:072 —2-118 
5 2-992 -593 1-716 
6 38-879 —2-493 "8337 


When this is rotated to simple structure, 


by post- 
multiplication by the matrix 


‘802 -389 -453 | 
—:592 -416 -691 | 
080 —-822 “564 | 


the resulting table is : 


A B (6 
1 5 -820 
2 0 “950 1-278 
3 2-154 -619 z 
4 5 2:577 . 
5 2-187 . 2:732 
6 4-213 


` This is identical with the simple structure found from 
the correlations, if the row. 


n If we have to guess communalities, our two simple structures 
will differ slightly because the highest covariance in a column may 


not correspond to the highest correlation. But with a battery of 
many tests this difference 


e will be unimportant, and could be 
annulled by iteration. 
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property of independence of the metric. If the tetrad- 
differences of a matrix of correlations are zero, and we 
analyse into one general factor and specifics, it is immaterial 
whether we analyse correlations or covariances. The 
loadings obtained in the latter case are exactly the same 
except, of course, that each is multiplied by the appropriate 
standard deviation: 

‘At this point one is reminded of Lawley’s loadings* 
found by the method of maximum likelihood, for these 
possess the property that the unrotated loadings obtained 
from correlations are already the same as the wnrotated 
loadings obtained from covariances, if the latter are 
divided by the standard deviations. Centroid analyses, 
or principal component analyses, do not possess this 
property. The loadings obtained by these means from 
covariances cannot be simply divided by the standard 
deviations to give the loadings derived from correlations, 
though the one can be rotated into the other. Lawley’s 
loadings need no such rotation. They are, as it were, at 
once of the same shape whether from covariances or from 
correlations and only need an adjustment of units, such as 
one makes in changing, say, from yards to feet. A field 
which is 50 yards broad and 20 poles long has the same 
shape as one which is 150 feet broad and 330 feet long. 

Now, as we have seen, this property of equivalence of 
covariance and correlation loadings is also possessed by 
simple structure. It would thus not be unnatural to hope 
that Lawley’s method might lead straight to simple 
structure, without any rotation. But this is not the case. 
Clearly, then, simple structure is not the only position of 
the axes where the loadings are independent of the units of 
measurement employed. Indeed, any subsequent post- 


* In accordance with our definition on page 170, the term “ load- 
ing” means a coefficient in a specification equation, an entry in a 
“ pattern.” In the present chapter it is used throughout and is 
Strictly correct when the axes referred to are orthogonal. If the 
axes are oblique, then much of what is said really refers to the items 
ina structure, not in a pattern: but the word “ loading ” is still used 
to avoid cireumlocutions, and because the structure of the reference 
vectors is, except for a diagonal matrix multiplier, identical with the 


pattern of the factors. 
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multiplication of both the simple structure tables—both 
that from correlations and that from covariances—by the 
same orthogonal rotating matrix will leave their equivalence 
with regard to units unharmed. Simple structure is only 
one of an infinite number of positions which possess this 
property. But it is an easily identifiable one. ; 
It is difficult to keep one’s mind clear as to the meaning 
of this. Let me recapitulate.( There are some processes 
of analysis which, while they give a perfect analysis in the 
sense of one which reproduces the correlations (or the co- 
variances) exactly, do not give the same analysis for the 
correlations as for the covariances. The factors they 
arrive at depend upon the units of measurement employed 
in the tests. Such, for example, are the principal compon- 
ents process and the centroid process. Such processes 
cannot be relied on to give straight away and without 
rotation, factors which can be called objective and scien- 
tific. Some processes, on the other hand, do give analyses 
which are independent of the units. One such is Lawley’s, 
based on maximum likelihood. Another is Thurstone’s 
simple-structure process, which, though it begins by using 
a centroid analysis, follows this by rotation of a certain kind. 
But the principle of independence of units does not 
distinguish between these processes, which both satisfy it. 
Still less does it distinguish between systems of factors. 
For any one of the infinite number of such systems which 
can be got from either simple structure or Lawley’s factors 
by rotation equally satisfies the principle. Indeed, there 
can really be no talk of a system of factors satisfying the 
principle. Any table of loadings whatever, obtained from 
correlations, has, of course, corresponding to it a system 
differing only in that the rows are multiplied by coefficients, 
a system which would correspond with covariances. 
The fact that no one has discovered a process which gives 
both is irrelevant. The argument is rather as follows. If 
a worker believes that he has found a process which gives 
the true psychological factors, then that process must be 
independent of the metric, and simple structure and 
maximum likelihood are both thus independent, though 
they do not, alas, agree. Nor must it be forgotten that 
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analyses from correlations are in no way superior to those 
from covariances. Indeed, correlations are covariances, 
dependent upon as arbitrary a choice of units—namely 
standard deviations—as any other. But centroid axes 
in themselves, or principal components, without rotation, 
are clearly inadmissible, for they change with the units 
used. The chance that such axes are the true ones is 
infinitesimal, being dependent on the chance composition 
of the battery, and the system of units which chances to 
be used. Independence of metric is not sufficient to 
validate a process but it is necessary. Its absence does 
not prove a system of factors to be wrong, but it makes it 
certain that the process by which they have been arrived 
at does not in general give the true factors. . 

3. Specifics—These form a fundamental problem in 
factorial analysis and yet they are practically never heard 
of in discussions of an analysis. It is reasonable enough to 
think that a test may require some trick of the intellect 
peculiar to itself, yet it is not obvious that these specific 
factors must be made as large and important as possible ; 
and that is what the plan of minimizing the rank of a 
matrix does. The excess of factors over tests which 
inevitably, of course, results from postulating a specific in 
every test, means that the factors cannot be estimated with 
any great accuracy. Usually the accuracy is very low 
indeed. The determinate and the indeterminate parts of 
each of Thurstone’s factors in Primary Mental Abilities can 
be found by post-multiplying Table 7 on his page 98 by 
Table 3 on his page 96. We find : 


Variance of the Variance of the 
pacit Estimated Part Indeterminate Part 

Sa: r 5 ROI -389 
Pe 3 s . 616 "384 

2 175 
Ne: 5 . 825 
v a . 662 338 
M. š . 481 -569 
W -489 -561 
I -397 -603 
R -600 -400 
D CDLI -481 
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The average for the nine factors is only 564 per cent. of 
the variance estimated. In other words the factor 
estimates have large probable errors in some cases as large 
as the estimates themselves. This has serious conse- 
quences, not to be overcome by more reliable tests. 

Using unity for every diagonal element in the matrix of 
such a battery will give factors (supposing the same 
number of them to be taken out) which will not imitate the 
correlations quite so well, but which can be estimated 
accurately. 

‘In fact, whether Hotelling’s process or the centroid 
process is used, with unit communalities, each factor can 
be calculated exactly for a man, given his scores. By 
exactly we mean that they are as accurate as his scores are. 
Of course, in any psychological experiment the scores may 
not be accurate in the sense that they can be exactly 
reproduced by a repetition of the experiment. Apart from 
sheer blunders and clerical errors, there is the fact that a 
man’s performance fluctuates from day to day. But 
these errors are common to any process of calculation 
which may be used on the scores. These are not the errors 

_ for which we are criticizing estimates of a man’s factors. 
~The point we are making is that factors based on com- 
munalities less than unity have a further, and large, error 
of estimation, whereas factors based on unit communalities 
(even if only one or two or a few are taken out) have no 
such further error of estimation. 

If a few such factors taken out with unit communalities 
are then rotated (keeping them in the same space, i.e not 
changing their number) they still remain susceptible of 
exact estimation in a man. ? 

As soon, however, as any fractions, minimum or not, are 
placed in the diagonal cells, we have thereby decided to 
use, in describing our tests, more orthogonal axes than there 
are tests ; for each test has then a specific factor, and there 
are in addition the common factors. This means in terms 
of our spatial model that none of the axes, neither the 
common factors nor the specific factors, are in the test 
space at all (except at the origin where they all cross). It 
is only about the test space, of dimensions equal to the 
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number of tests, that we have any information from our 
battery. These axes are away in outer darkness and we 
cannot know them, but only their projections or shadows 
on the test space. Psychologists invariably confine their 
attention, after making an analysis using communalities, 
to the “ common factor space,” of a comparatively small 
number of dimensions, without, I think, being usually 
aware that this space is not in the test space at all. (Thur- 
stone’s “ secondary factors,” in their turn, are not even in 
the common factor space, for he uses what I might call 
Secondary communalities.) The effect of all this is that the 
factors arrived at by an analysis which has begun by 
placing fractions in the diagonal cells can never be measured 
in any man, but only vaguely estimated, and with maxi- 
mum vagueness if minimum communalities are used. 

In itself the fact that factors can only be estimated and 
not accurately measured is, of course, not fatal. Through- 
out statistical work runs the idea of estimation in a realm 
Outside that which is experimentally known, in a realm of 
More dimensions than that in which our measurements 
have been made. It is to allow for that that the device of 
i degrees of freedom ” is used in the analysis of variance. 
But in factorial analysis the vagueness due to estimation 
is deliberately maximized, for reducing the rank of a matrix 
of correlations involves the simultaneous maximizing of 
the specific variances. In Section 3 of the previous chapter 
a brief reference was made to this fact that methods of 
factorizing which use communalities maximize the variance 
of the specific factors, by reason of minimizing the number 
of common factors. First take the case of the analysis of a 
hierarchical battery. As was illustrated in Chapter XX 
the analysis of such a battery into one general factor only, 
and specifics, gives the maximum variance possible to the 
Specifies. The combined communalities of the tests are 
less in the two-factor analysis than in any other analysis. 
The mathematical expression of this is that the trace of the 
reduced correlation matrix, i.e. the sum of the cells of the 
Principal diagonal, is a minimum. 

It is true that certain exceptions to this statement are 
Mathematically possible, but their occurrence in actual 
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psychological work is a practical impossibility. They have 
been investigated by Ledermann (Ledermann, 1940), who 
finds, in the case of the hierarchical matrix, that an excep- 
tion is only possible when one of the g saturations is greater 
than the sum of all the others. When the battery is of 
any size, this is most unlikely to occur : and almost always, 
when it did occur, the large saturation of one test would 
turn out to be greater than unity, which is not permissible 
(the Heywood case). 

The same statement as the above, that the specifics are 
maximized, is also true in general. The communalities 
which give the matrix its lowest rank are in sum less than 
any other diagonal elements permissible. If smaller 
numbers are placed in the diagonal cells, the analysis fails 
unless factors with a loading of 4/ — 1 are employed, and 
such factors are, of course, inadmissible. 

Here again there are possibly cases where the lowest 
rank is not accompanied by the lowest trace (i.e. the lowest 
sum of the communalities). But here again it seems cer- 
tain that if such cases do exist, they are mathematical 
curiosities which would never occur in practice. 

If specific factors of such large size have any psycho- 
logical existence, what can they be? Possibilities which 
will occur to us are first, that they are error factors—but 
errors or variations in the subject’s performance are not 
likely to be entirely unique to one test. Secondly, they 
have been attributed to sampling errors in the coefficients 
of correlation—but these sampling errors are themselves 
correlated, and so give rise to false common factors, not to 
specific factors. Thirdly, they may be real mental factors, 
unique to that test, needed only by it. But what remark- 
able consequences follow if we accept that. I devise a 
brand-new test and, lo, in the mind of man there exists a 
specific ability to do that test and, moreover, an ability 
which is useless in every other activity. Further, every 
individual I meet possesses this specific ability in large 
or small amount. The idea in this form is really fan- 
tastic. 

It would seem then that the specifics cannot be really 
` unique, but only unique in this battery, This leads to the 
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presumption that the tests of a battery possess specific 
factors only because there does not happen to be in the 
battery any other test to share the specific, or at least part 
of it, and prove it to be really one or more common factors. 
On this view, specifics will disappear when a test has been 
tried in a large number of batteries, or in a sufficiently large 
battery. Not only does this seem unlikely when one 
considers that in every battery the minimum communalities 
and maximum specifies are insisted on, but it also has 
peculiar consequences in regard to the number of primary 
factors. Consider a battery consisting of, say, two dozen 
tests, analysed into, say, seven common factors plus, of 
course, two dozen specifics. The latter, it must be re- 
membered, are all orthogonal, all uncorrelated with one 
another. On the hypothesis that they are really factors 
which just do not happen to have found a partner, like 
wallflowers at a ball, there must exist at least two dozen 
other primary factors waiting to be discovered in a larger 
battery. And so with every battery of tests. The number 
of primary factors must be larger than all the tests hitherto 
invented, which does not seem to be parsimonious. I can- 
not help fearing that there is something wrong with the idea 
of reducing the matrix of correlation coefficients or co- 
variances to its lowest possible rank, and then calling the 
descriptive variates to which this leads “ factors of the 
mind: something wrong with the whole idea of attri- 
buting as much as possible of the variance of a test to a 
unique factor, something wrong with the “ parsimony ” 
argument upon which all this is based. It leads to too 
many difficulties to which it is possible, but not, I think 
advisable, to shut one’s eyes. Moreover, the reciprocity 
principle, which identifies factors and loadings obtained 
from correlating tests with loadings and factors obtained 
from correlating persons, works only when there are no 
specifics involved. I would like to see a number of existing 
Squares of correlation coefficients re-analysed with full 
variance in each diagonal cell and the results considered. 
There would be no guessing of the communalities, and no 
repetitions or iterations of the calculation to determine 
them. Tests of significance of residues would be more 
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easily made, and although rather more factors would be 
necessary before the residues became insignificant, they 
would have the advantage of more accurate estimation in 
any man. True, such factors would be confined to the 
particular test space of that battery, and admittedly a 
factor of the mind is not likely to be an exact composite of 
the tests of any one battery. But the point is an academic 
one, for the common-factor space in which communality 
factors exist, is just as much a creation of the particular 
battery as are axes determined within the battery 
space. 

I must not be misunderstood as saying that no specific 
factors exist at all. What I am sceptical about is the pro- 
cedure of making the specific factors in every battery as 
large as possible, by the automatic application of a mathe- 
matical device. That every test may well have some 
unique quality for any individual person seems conceivable, 
though I do not think this special feature of the test will 
be felt as a peculiarity by every person who tries the test. 
I think any such unique quality would be a blemish in the 
test, just-as unreliability is a blemish, and that the psych- 
ologist should endeavour to make tests which are neither 
unreliable nor burdened with unique peculiarities. Prob- 
ably he cannot avoid a certain amount of uniqueness, just 
as he cannot avoid a certain amount of unreliability. But 
I do not sce the need for ascribing maximum uniqueness in 
order to reduce the number of common factors. 

A critic may point out that, if even small truly unique 
parts of the tests are admittedly present, there will always 
be the need for the large number of specifics. Possibly so 
—hbut specifics of no great importance, if the tests are 
good ones; specifics with an influence as unimportant as 
the causes are of the residuals which we in any case ignore 
after statistical testing. 

It is true that by the use of communalities the total 
number of loadings to be estimated is reduced to a mini- 
mum. That way of putting the parsimony argument 
would be perhaps defensible. What I doubt is whether too 
high a price is not paid, since this same procedure maxi- 
mizes the specifics, and decides their importance without 


SOME FUNDAMENTAL QUESTIONS 341 


any psychological consideration whatever being given to 
the question. 

The practical conclusions I would draw from these con- 
siderations about the nature of specific factors are that a 
battery used for factorial analysis should be composed of 
tests of high communality in that battery: or that, if 
tests are admitted which by the mathematical principle 
of rank reduction are allotted low communalities, the 
psychologist should agree that these tests do draw, each 
of them, upon factors of the mind not represented elsewhere 
in the battery. 

Such is the argument against minimum communalities. 
For them is the hope that some day, despite their draw- 
backs, the factors they lead to may prove to be something 
real, perhaps have some physiological basis. And their 
defender may plead that the estimates of these factors are 
as good as the estimates we find useful, in predicting 
educational or occupational efficiency. ? 

4. Oblique factors —I think it is pretty certain | that 
Thurstone took to oblique factors because he wants simple 
Structure at all costs. Certainly oblique factors make it 
much easier to reach simple structure—too easy, Reyburn 
and Taylor say. It will be found far more often than it 
really exists, they add. On the other hand, Thurstone 
can point to his box example and his trapezium example 
and say with truth that simple structure enabled him 
to find “ realities,” can say that the oblique simple struc- 
ture is something more real, in the ordinary common-sense 
everyday use of the word, than the orthogonal second- 
order factors which are an alternative. 

Other workers, not at all wedded to the ideas of simple 
Structure, have also declared their belief in oblique factors, 
eg. Raymond Cattell, and, I think, many who feel inclined 
to work in terms of “ clusters.” In ordinary life, weight 
and height are both measures of something real, although 
they are correlated. We could analyse them into two 
uncorrelated factors a and b, or into three for that matter, 
but certainly no one would use these in ordinary life. It 
Is, however, just conceivable that some pair of hormones 
(say ) might be found which corresponded, not one of them 
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to height and one to weight, but one to orthogonal factor 
a and another to orthogonal factor b. It is far too early 
to state anything more than a preference for orthogonal 
or oblique factors. Opinion is turning, I think, toward 
the acceptance of the latter. 
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PARAGRAPHS 

1. Textbooks on matrix algebra. 2. Matrix notation. 
3. Spearman’s Theory of Two Factors. 4. Multiple common 
factors, 5. Orthogonal rotations. 6. Orthogonal transforma- 
tion from the two-factor equations to the sampling equations. 
7. Hotelling’s “ principal components.” 8. The pooling square. 
9. The regression equation. 9a. Relations between two sets 
of variates. 10. Regression estimates of factors. 10a, Leder- 
mann’s short cut. 11. Direct and indirect vocational advice. 
12. Computation methods. 18. Bartlett’s estimates of fac- 
tors. 14, Indeterminacy. 15. Finding g saturations from an 
imperfectly hierarchical battery. 16. Sampling errors of 
tetrad-differences. 17. Selection from a multivariate normal 
population. 17a. Maximum likelihood estimation (by D. N. 
Lawley). 18. Reciprocity of loadings and factors in persons 
and traits. 19. Oblique factors. Structure and pattern. 
19a. Second-order factors. 20. Boundary conditions. 21. 
The sampling of bonds. 


1. Teatbooks on matrix algebra—Some knowledge of 
matrix algebra is assumed, such as can be gained from the 
mathematical introduction to L. L. Thurstone’s Multiple 
Factor Analysis (Chicago, 1947); Turnbull and Aitken’s 
Theory of Canonical Matrices, Chapter I (London and 
Glasgow, 1932); H. W. Turnbull’s The Theory of Deter- 
minants, Matrices, and Invariants, Chapters I-V (London 
and Glasgow, 1929); and M. Bécher’s Introduction to 
Higher Algebra, Chapters II, V, and VI (New York, 1936). 

I have adopted Thurstone’s notation in Sections 19 
and 19a of the mathematical appendix, and in Chapters 
XI, XII, and XII in describing his work. But I have not 
made the change elsewhere because readers would then be 
incommoded in consulting my own former papers. 

The chief differences are as follows : 

My M is Thurstone’s F, for centroid factors, my Z is 
Thurstone’s S + «/N, and my F is Thurstone’s P + yN. 

345 
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2. Matrix notation.—Let X be the matrix of raw scores 
of p persons in n tests, with n rows and p columns; and 
when normalized by rows, let it be denoted by Z. The 
letters z and Z in the teat of this book mean standardized 
scores, which are used in practical work, but in this 
appendix they mean normalized scores, so that— 

ZZ' =R 3 5 >» () 
the matrix of correlations between n tests. 

For many purposes it is convenient to think of solid 
matrices like Z as column (or row) vectors of which each 
element represents a row (or column). Thus Z can be 
thought of as a column vector z, of which cach element 
represents in a collapsed form a row of test scores. Thus 
with three tests and four persons— 


1 Zu 2 3 á Žu 
os pA = y ~ oon r 
z=| 2 |=| 2 Z2 Zs 2a |= 2 . (2) 


31 232 233 34 


In the theory of mental factors each score is represented 
as a loaded sum of the normalized factors f, the loadings 
being different for each test, i.e.— : 

z = Mf (specification equations) . (3) 
where M is the matrix of loadings and f the vector of v 
factors, collapsed into a column from F, the full matrix, 
of dimensions v X p. 
We note that p = number of persons, 
n = number of tests, 
v = number of factors. 

The dimensions of M aren x v. Equation (8) represents 
n simultaneous equations, and the form Z = MF represents 
np simultaneous equations. 


We now have— 
R=ZZ' =(MF)(MP) = MFF'M' . (4) 
If the factors are orthogonal, we have— 
RE = ji. ay ete Oe 
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the unit matrix, and therefore— 


R=MM . : : : SEG) 
The resemblance in shape between this and— 
VAEN o ; : 5 & — (Gd) 


-leads to a parallelism between formule concerning persons 
and factors (Thomson, 1935), 75; Mackie, 1928a, 74, and 
1929, 34). 

3. Spearman’s Theory of Two Factors assumes that M 
is of the special form— 


any 5 
M = hasa ma m ,2+m?=1 . (7) 
ln M, 


and therefore— 
R=W +M? .. $ «= (6) 
where M, is the diagonal matrix which forms the right- 
hand end of M, and J is the first column of M. In this 
form it is clear that R is of rank 1 except for its principal 
diagonal. Its component ll’ is the “ reduced correlational 
matrix ” of the Spearman case, and is entirely of rank 1. 
The elements 1°, l$, . > . l? which form the principal 
diagonal of il’, are called “ communalities.” 
4. Multiple common factors—When more than one 
common factor is present, M takes the form— 
M=(My'M,) -> : + + (9) 
where M, is the matrix of loadings of the common factors, 
represented in the Spearman case by the simple column l. 
We have then— 

R=MM =MM,/+™ . z (LO) 
where the “reduced correlation matrix’? MM)’ is of 
rank v, the number of common factors, and is identical 
With R except for having “ communalities ” in its principal 
diagonal. 

5. Orthogonal rotations.—If we express the v factors f in 
terms of w new factors ọ by the equation— 


P E te SE 
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where A is a matrix of v rows and w columns, we have— 

z = Mf=MAo A : a, (12) 
an expression of the tests z as linear loaded sums of a 
different set of factors, with a matrix of loadings MA. 

If— 

Ar =r ET AR O 8) 
the new factors ọ are orthogonal like the old ones. They 
can be as numerous as we like, but not less than the number 
of tests unless the matrix R is singular. (12) represents a 
rigid rotation of the orthogonal axes f into new positions, 
with dimensions added or abolished. 

6. The sampling theory—The following transformation 
is of interest as showing the connexion between the 
Theory of Two Factors and the Sampling Theory (Thom- 
son, 1935), 85). We shall write it out for three tests only, 
but it is quite general. Consider the orthogonal matrix : 


| Ul i mll Iml lmi mml mim lmm ! mmm 
eee ey (a ee ee alle a as Si eo oe | 
mll i —Ul `mm mim! —im —im mmm | —lnm 
| imi mm lll immi — mill mmm — lim | — mlm | 
| lm; mm Imm =U; mmm —mil —lml! —mml | aa) 
aort l=- mMM — a a a Aea a 
| mml; —iml —mil mmm! lll — imm —mim! lm | 
mlm! —Ilm- mmm —mll; —Imm ul —mml Iml | 
lmm! mmm —lm —Iml | —mim —mmi ly mil | 
E A E E | 
| I 1 
mmm | — Imm —mlm —mmnl | llm tml mll, —Il 


wherein the omitted subscripts 1, 2, and 8 are to be 
understood as existing always in that order, so that mll 
means Mlslz. 

If we take for A in Equation (12) the first four rows 
of this orthogonal matrix, and for M the Spearman form 
(7) with three tests, the result is to transfer to eight new 
factors, yielding : 

z = hlapi + Mbps + Mpa + MMP, 
z = Alp, + Mpa + LMP, + MMPs . . (15) 
Z3 = Llp, + Mlp, + mops + MMPs 

Each z is here in normalized units. If, however, we 
change to new units by multiplying the three equations 
by L, l, and l, respectively, we have : 
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bay = Gh, + Mbps + limp, + hmmp, 
LLLP, + mMlglyp. + Lemp, + mlm ps s . (16) 
bz = Qlalepy + MylglyP2 + LmelyPy + MyMalyPs 


EA 


and the variates 1,2,, lx, and 1,2, are now susceptible of 
the explanation that each is composed of l° N small equal 
components drawn at random from a pool of N such 
components, all-or-none in nature. In that case 1,21,%l,2N 
components would probably appear in all three drawings 
(91) 3 L2l,2m2N components would probably appear in the 
first two drawings, but not in the third (9,); and so on 
down to m,2m,2mg? components, which would not appear 
at all (9, which is missing from the equations). 

The transformation can, of course, be reversed, and the 
sampling theory equations converted into the two-factor 
equations. 

7. Hotelling’s “ principal components ” are the principal 
axes of the ellipsoids of equal density— 

s‘R-'z = constant . : . (17) 


when the test vectors are orthogonal axes (Hotelling, 1933). 
To find the principal axes involves finding the latent 
roots of R-!. The Hotelling process consists of (a) a 
rotation of the axes from the orthogonal text axes to the 
directions of the principal axes; and (b) a set of strains 
and stresses along these new axes to standardize the factors, 
making the ellipsoid spherical and the original axes oblique. 
The transformation from the tests to the Hotelling factors 

Y being from Equation (3)— 

z = My (M square) 

the ellipsoids (17) become— 

constant = 3/R7!s = y'(M'R-'M)y =y'y  . (18) 
since they become spheres. Therefore we must have— 

MRM = 3 <. (19) 
The locus of the mid points of chords of z'R~'z whose 
direction cosines are h’ is the plane h'R~'z = 0, and if this 
1s a principal plane it is at right angles to the chords it 


bisects, ise. — 
h'R = 
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which has non-trivial solutions only for— 
| R= — M| =0 
the roots ^ of which are the “latent roots”? of R-}, while 
each h’ is a “ latent vector.” 
Now, if H is the matrix of normalized latent vectors of 
R-', we have— 
HRA = A 
where A is the diagonal matrix of the latent roots of R7; 
so that a solution for M corresponding to rotation to the 
principal axes and subsequent change of units to give a 
sphere is seen to be— 
M=HA“ : . (20) 
The latent vectors of R are the same as those of R-t, 
or of any power of R, and Hotelling’s process described 
in the text (Chapter VII) finds the latent roots (forming the 
diagonal matrix D) and the latent vectors (forming H) of 
R. We then have— 
M=HD! . ; T (ei) 
For the convergence of the process, see Hotelling’s paper 
of 1933, pages 14 and 15. 
Since in Hotelling analyses M is square, we can write— 
y = Mz = (HD?) 
= D>H™z = D-\(D'H')z = D-'M'z . (22) 
Each factor y, that is, can be found from a column of 
the matrix M, divided by the corresponding latent root, 
used as loadings of the test scores z. i 
8. The pooling square.—If the matrix of correlations of 
a + b variates is: 


Ra Ry, 


Rya Ry, ? 
and if the standardized variates a are multiplied by weights 
u, the standardized variates b by weights w, and each set 
of scores summed to make two composite scores, the 
resulting variances and covariances are : 


(23) 


WRau | wR yw 


(24) 


w’R,,w | w'R,w 
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as can be seen by writing out the latter expressions at 
length. The battery intercorrelation i is therefore— 
wRy,w or w'R,,u 
VU Rat X wRyw) 

If weights are applied to raw scores, each applied weight 
must be multiplied by each pre-existing standard deviation, 
in (25). 

If there is only one variate in the a team, (25) becomes— 

OF Se 2x 8lES) 
(w Rw) 
where 7,, represents a whole column of correlation coeffi- 
cients. The values of w for which this reaches its maximum 
value will satisfy the equation— 
3 Oe 9 5 2) 
So y(w Ryw) 


(25) 


that is— 
w = a scalar X R'a . . (28) 
consistent with the ordinary method of deducing regression 


Coefficients. 
9. The regression equation —If zo is the one variate in 


the a team, Badia z are the b team, and if — 
R . 2 ‘ . (29) 


we wish to make S(z) — £p)? a minimum, that is— 


(30) 
If R is the matrix of correlations of all the tests including 
Zo the regression estimate of any one of the tests from a 
Weighted sum of the others is given by— 
determinant R, = 0 : . (81) 
Where R, is R with the row corresponding to the variate 
to be estimated replaced by the row of variates. 


9a. Relations between two sets of variates. —(Hotelling 
19354, 1926, M. S. Bartlett 1948). If two sets of variates 


have correlation coefficients— 
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R, i, A C 
or 
Peers Ge |B 
and if the variates of the B team are fitted with weights b, 
then the correlations of the B team, thus weighted, with 
the separate tests of the A team are given by— 
Cd 
T e i 
Vb'Bb 
and the square of the correlation coefficient between the 
two teams is then— 
b'CA“"C'b 
bBb 
The maximum intercorrelation, and other points of in- 
flexion in à, will be given by— 
di/db = 0 
ie. (CA“'C’ —2B)b=0 . . (31.8) 
a set of homogeneous equations in b. We must therefore 
have— 


aa 


. (81.1) 


. (81.2) 


(CAFC — 7B |'=0 .. 16. 21(81/4) 
an equation for à with as many non-zero roots as the num- 
ber of variates in the smaller team. For any one of these 
roots à, the weights b are proportional to the co-factors of 
any row of (CA~'C’ — 2B). The corresponding weights 
a for the A team are then found by condensing the team B 
(using weights b) to a single variate and carrying out an 
ordinary regression calculation. 

The result is to “ factorize ” each team into as many 
orthogonal axes ‘as there are variates. These axes are re- 
lated to one another in pairs corresponding to the roots A. 
Each axis is orthogonal to all the others except its own 
opposite number in the space of the other team, arising 
from the same root A as it does, to which axis it is inclined 
at an angle arccos Va. Where one team has m more 
variates than the other, m of the roots will be zeros and 
the corresponding axes will be at right angles to the whole 
space of the other team. This form of factorizing has been 
called by M. S. Bartlett (1948) external factorizing, since 
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the position of the “ factors * or orthogonal axes in each 
team, in each space, is dictated by the other team. 

The weightings corresponding to the largest root give 
the closest possible correlation of the two weighted teams. 
If the two teams are duplicate forms of the same tests, this 
is the maximum attainable battery or team reliability 
(Thomson 1940, 1947, 1948). In this case Peel (Nature, 
1947) has shown that a simpler equation than 31-4 gives 
the required roots. IfA = u? Peel’s equation is— 

C= A= 0n TTS) 
where 4 differs from C only in the diagonal elements, which 
in A are unities but in C are reliabilities Ta of the individual 
tests. 

Green (1950) gives a transformation of this equation 
which enables Hotelling’s iterative process (see Chapter 
VII) to be used to find u, the maximum battery reliability. 
For the diagonal elements 7;; — u of the matrix (C — pA), 
Green writes— 


fees = zle E 


when 31:5 becomes equivalent to— 
| DCD — pI | =0 : o . (81.6) 
wherein D is a diagonal matrix with elements (1 — 7;)7}, 
Tis the unit matrix, and @ =y/(1—p). The latent vector 
V corresponding to the largest latent root of DCD can then 
be found by Hotelling’s process, and the best weights for 
maximum battery reliability are proportional to DV = W. 
The maximum reliability thus attained is— 
u = WCW/W AW 

10. Regression estimates of factors—When in the speci- 

fications— 


z=Mf. , 3 a (8) 
the factors outnumber the tests, they cannot be measured 
but only estimated. To all men with the same set of 
Scores z will be attributed the same set of estimated factors 
Sf though their “true” factors may be different. The 
regression method of estimation minimizes the squares of 


F.A.—12 
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the discrepancies between f and f, summed over the men. 
The regression equation (31) will be for one factor f,— 
ji 2 | 
=0. z . (82 
m R | (82) 
where m; is a column of M. Expanding, we have— 
fi = mR 2 


i 


and in general— 


f=MR"z : š . (38) 

or, separating the common factors and the specifics— 
fh =MR" . b . (84) 
fi = MAR" : $ . (85) 


the latter of which shows that we know the proportionate 
weights for each specific (the rows of R~*) even before we 
know whether that specific exists (Wilson, 1934, 194). 
The matrix of covariances of the estimated factors is— 


| M,/R-'M, Mý RM, 


A'R- = 
ge M= [MLR M, MRM, 


(38) 
a square idempotent matrix of order equal to the number 
of factors, but trace only equal to the number of tests. 

For one common factor, (34) reduces to Spearman’s 
estimate— 


4 uf Vig®i 
oo T4871 Era ‘ . (84a) 
Tig? 
where S = + 
a 


while K = M ý RM, in (36) reduces to $/(1 +S), the 
variance of g. 
10a. Ledermann’s short cut (1938a, 1989b)—The above 
requires the calculation of the reciprocal of the large square 
matrix R. Ledermann’s short cut only requires the reci- 
procal of a matrix of order equal to the number of common. 
factors. As long as the factors are orthogonal we have— 
R = MM’ + Me. b . (10) 
and the identity’ 
MyM (M, Mo + M?) = (MM, °M, + 1)Mo 
= (J + I)M,’ say. 
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Premultiplying by (I + J)! and postmultiplying by R7! 
we reach (I + J) MiM? = M,'R-. 5 . (36.1) 
a the left-hand quantity can then be used in Equation 
34). 

This short cut requires modification when the factors are 
oblique. See Equations (70.1) to (70.4) below. 

11. Direct and indirect vocational advice—If z is an 
occupation and z a battery of tests, the estimate of a 
candidate’s occupational ability is— 


4, =r (Ross ae) eo) 


where the ry are the correlations of the occupation with the 
tests. If z can be specified in terms of the common 
factors of z, and a specific sọ independent of z, then an 
indirect estimate of z, via the estimated fy is possible. We 
haye— 

%—=Mfots - $ . (88) 


where m,’ is a row of occupation loadings for the common 
factors fy of z, and also— 


fo = MR 


3 Substitution in (88), assuming an average sọ (= 0) 
gives— 
=m Mo R Z . é . (89) 
But— 

My Mo = To 3 b . (40) 
and (89) is identical with (37) (Thomson, 1936a). If, how- 
ever, sọ is not independent of the specifics s of the battery, 
(40) will not hold, and the estimate (39) made via an estima- 
(er of the factors will not agree with the correct estimate 

7). 

12. Computation methods——The “ Doolittle” method of 
Computing regression coefficients is widely used in America 

olzinger, 1987a, 82). Aitken’s method, used and 
explained in the text, is in the present author’s opinion 
Superior (Aitken, 1987a and b, with earlier references). 

€gression calculations and many others are all special 
Cases of the evaluation of a triple matrix product X Via: 
where Y is square and non-singular, and X and Z may 
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be rectangular. The Aitken method writes these matrices 


down in the form— 
w SA 


x 


and applies pivotal condensation until all entries to the 
left of the vertical line are cleared off. All pivots must 
originate from elements of Y. By giving X and Z special 
values (including the unit matrix I) the most varied 
operations can be brought under the one scheme. 

13. Bartlett’s estimate of factors—We have z = Myf + 
M,f;, where fọ and fı are column vectors of the common 
and specific factors respectively and M, is a diagonal 
matrix. Bartlett now makes the estimates fy such as will 
minimize the sum of the squares of each person’s specifics 
over the battery of tests, i.e.— 


gif) =0 
or— (in = 0 


ni 
(— MM.) (Mz — M,-"My fr) = 0 
MyM z = MM," Mofo 
= Jf, say 
fo =J"M,'M,*z . (41) 
(Bartlett, 1937a, 100.) 
One could also find the estimated specifics as— 
f= (M MI MMM (42) 
Sig a y 


stata [4 


we get for the relation between f and f— 
ř , -1 
So -[- LN! Sal eee | [£] = Af (48) 
JA I — MMI M MA Lf. 


and for the covariances of f we get— 
LI 


, Sete P E E E ee 44) 
AA! = [7 tM -| ( 
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The error variances and covariances of the common 
‘factors are— 
(fo =I fo — fo)’ = IM Mff) My MI 
Se (Me M ee) 
(Bartlett, 1937a, 100.) 
When there is only one common factor, J becomes the 
familiar quantity— 
Vig 
| = ma 
(Bartlett, 1985, 200.) 
As was first noted by Ledermann *— 
I+J7°=(M,/R"M,)1=K" . (46) 
(quoted by Thomson, 1938a); and using this we see that 
the back estimates of the original scores from the regression 
estimates fọ are identical with the insertion of Bartlett’s 
estimates /, in the common-factor part of the specification 
equations, viz.— 
MKM Rz = MJ Mo Mz . . (47) 
(Thomson, 1938a.) 
Bartlett has pointed out that, using the same identity, in 
the form K = J(I — K), it is easy to establish the rever- 
sible relation between his estimates and regression esti- 
mates— 


Jo 


h= Kfo, fo = K-'fo : . (48) 
(Bartlett, 1988) 
and he summarizes their different interpretation and prop- 
erties by the formule— 
Bij} = Eff} =0, Eh —folfo—f)} =1—K (49) 
Ext fo} = fo, Ex\(fo — foo — fo)’} =J 
=K-\I—kK) . (50) 
where Æ denotes averaging over all persons, E, over all 
possible sets of tests (comparable with the given set in 
regard to the amount of information on the group 
factors fy). 
l4. Indeterminacy—The fact that estimated factors, if 
the factors outnumber the tests, necessarily have less than 
* Letter of October 28, 1937, to Thomson. 
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unit variance has sometimes been expressed in the case of 
one common factor by postulating an indeterminate 
vector i whose variance completes unity. This i may be 
regarded as the usual error of estimation, and is a function 
of the specific abilities (Thomson, 1934, B.J.P., 25, 92). That 
M’'R-'M in Equation (86) is of rank less than its order also 
expresses the indeterminacy, and allows the factors to be 
rotated to different positions which nevertheless fulfil all 
the required conditions. In the hierarchical case the 
transformation which effects this is (Thomson, 1935a)— 


$i f= Bọ 3 ; : . (51) 
where B means the required number of rows of — 
B=I—2q'/q'q_ . : + (52) 


in which— 
qi = l;/m; (see Equation 7) - (58) 
as far as there exist tests, after which q is arbitrary. 

For— 

z = Mf = MBo = Mo 
since— 

MB=M. ` . (54) 
and z is thus expressed by identical specification equations 
in terms of new factors o. For such transformations in the 
case of multiple factors see Thomson, 1936a, 40; and 
Ledermann, 1938c. 

If the matrix M is divided into the part M, due to 
common factors and the part M, due to specifics, as in 
Equation (9), then Ledermann shows that if U is any 
orthogonal matrix of order equal to the number of com- 
mon factors, the matrix— 


B=I— RRR) — UNKT 


(Q’Q) 
: —I 
wherein— Q= a 3 


will satisfy the equation— 
MB =M 


Indeterminacy is entirely due to the excess of factors 
over tests, i.e. to the fact that the matrix of loadings M 
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is not square. It can be in theory abolished by adding 
’a new test which contains no new factor, not even a new 
specific; or a set of new tests which add fewer factors 
than their number, so that M becomes square (Thomson, 
1984b ; 1935a, 253). In the case of a hierarchy each of 
these tests singly will conform to the hierarchy, so that 
their saturations l can be found; but jointly they break 
the hierarchy. If they add no new factors, g can then be 
found without any indeterminacy. 

15. Finding g saturations from an imperfectly hierarchical 
battery—The Spearman formula given in Chapter III, 
Section 5, is the most usual method. A discussion of other 
methods will be found in Burt (1936, 283-7). See also 
Thomson (1934a, 870), for an iterative process modified 
from Hotelling. 

16. Sampling errors of tetrad-differences—The formule 
(16) and (16a) given in the text are both approximations, 
but appear to be very good approximations. The primary 
papers are Spearman and Holzinger (1924 ,and 1925). 
Critical examination of the formule have been made by 
Pearson and Moul (1927), and Pearson, Jeffery, and Elder- 
ton (1929). Wishart (1928) has considered a quantity P 
which is equal to P’N?/(N — 1)(N — 2), where P’ is the 
tetrad-difference of the covariances a instead of the correla- 
tions, and obtained an ewact expression for the standard 
deviation c of P— 
N+1 
N-1 
where the D’s are determinants of the following matrix 
and its quadrants : 


(N —2)o2 = DD BDDEe oo) 


dui Oya, ash Gta 
| 
| Ga Ga | Gs a 
anana l=] 
a3, D s3 Asa | 


| a I 
j as Maa | 


But approximate assumptions are necessary when the 
standard deviation of the ordinary tetrad-difference of the 


g 
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correlations is deduced from that of P. The result for 
the variance of the tetrad-difference is— ‘ 
N+1 l 
(wT — 2) ' 
where R is the 4 x 4 determinant of the correlations. 

17. Selection from a multivariate normal population.— 
The primary papers are those of Karl Pearson (1902 and 
1912). The matrix form given in the text (Chapter XIX, 
Section 2) is due to Aitken (1934), who employed Soper’s 
device of the moment-generating function, and made a 
free use of the notation and methods of matrices. A 
variant of it which is sometimes useful has been given by 
Ledermann (Thomson and Ledermann, 1938) as follows. 
If the original matrix is subdivided in any symmetrical 
manner : 


Ty9°)(1 731°) — R - (56) 


R, Ry Rys Ry, 
Ryp R,, Ry Ry 
Ry Ra Re Ry 
Rp Rọ, R, Ry 


and R, is changed by selection to Vp then each resulting 


sub-matrix, including Vp itself, is given by the formula— 
Vip es A aaa Rapp Rg 

EGE rowel Gir 

where— Ep = Rg — Rig Vre CW 

17a. Maximum likelihood estimation—The maximum 

likelihood equations for estimating factor loadings (Lawley, 

1940, 1941, 1943b) may be expressed fairly simply in the 

notation of previous sections. It is necessary, however, 

to distinguish between the matrix of observed correla- 
tions, which we shall denote by Ro, and the matrix— 


R = MM,’ + M 
which represents that part of Rọ which is “ explained ” by 


the factors. 
The equations may then be written— 


M,=M/RR, . .  . (58) 


s 
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These are not very suitable for computational work. 

¿It may, however, be shown that: 

MoR- = (I — K)M'M,? = (I +I) >’ MM,’ (59) 
where, as before, 

K=M/R"M, J =M; MM9 
‘ Hence our equations may be transformed into the 
orm— 


My = (I + JI) MMR - . (60) 
or alternatively, 
Mý = IMi MR — My) . (61) 

When there are two or more general factors the above 
equations will have an infinite number of solutions corre- 
sponding to all the possible rotations of the factor axes. 
A unique solution may, however, be found such as to 
make J a diagonal matrix. 

Finally, if we put— 

L =M,'M,R, — My’ 
V = LM, ° Mo, 

then, from the last set of equations 
V = JM’ M, M, = J? 


Hence we have— 

M, = VL : i : . (62) 
These equations have been found the most convenient in 
practice, since they can be solved by an iterative process. 
When first approximations to Mo and M, have been ob- 
tained, they can be used to provide second approximations 
by substitution in the right-hand side. 

18. Reciprocity of loadings and factors in persons and 
traits (Burt, 1937b).—Let W be a matrix of scores centred 
both by rows and columns. Its dimensions are traits X 
persons (t. p), and its rank is 7 where 7 is smaller than 
both ¢ and p in consequence of the double centring. The 
two matrices of covariances are WW’ for traits and W'W 
for persons, and by a theorem first enunciated by Sylvester 
in 1883 (independently discovered by Burt), their non-zero 
latent roots are the same. If their dimensions differ, 
ie. t + p, the larger one will have additional zero roots. 


F.A.—12* 


e 
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Let the non-zero roots form the diagonal matrix D. Then 
the principal axes analyses are : = 
W = H,D'F,, dimensions (t . r)(r . 7)(7 . p) 
and W’= H,D'F,, dimensions (p . r)(r . r)(r . t) 
where H, and H, are the latent vectors of WW’ and W'W, 
while F} is the matrix of factors possessed by persons, 
F, that of factors possessed by traits. From the analysis 
of W we have, taking the transpose— 
W'= F,'D'H,', dimensions (p . r)(r . r)(r . t) 
and comparison of this with the former expression for W” 
makes the reciprocity of H, and F’, F, and H,’, evident. 

19. Oblique factors. Structure and pattern—In Thur- 
stone’s notation, which we shall follow in this paragraph, 
the matrix M of our equation (3), when it refers to centroid 
factors, is called F. Our equation (8) becomes in his 
notation— 

s=Fp 
Since centroid factors are orthogonal, F is both a pattern 
and a structure. The structure is the matrix of correla- 
tions between tests and factors, i.e. : 
Structure = sp’ = (Fp)p' = F(pp’) = FI = F = Pattern. 

When the factors are oblique, however, this is not the 
case. In that case, Structure = Pattern x matrix of 
correlations between the factors. 

Thurstone turns the centroid factors to a new set of 
positions (still within the common-factor space, and in 
general oblique to one another) called reference vectors. 
The rotating matrix is A, and 

Vek : . (63) 
is the structure on the reference vectors. The cosines of 
the angles between the reference vectors are given by A’A. 
Vis not a pattern. Its rows cannot be used as coefficients 
in equations specifying a man’s scores in the tests, given 
his scores in the reference vectors. The pattern on the 
reference vectors would not have those zeros which are 
found in V. 

The primary factors are the lines of intersection of the 
hyperplanes which are at right angles to the reference 
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vectors, taken (r — 1) at a time where r is the number of 
«common factors, the number of dimensions in the common- 
factor space. They are defined, therefore, by the equations 
of the hyperplanes, taken (r — 1) at a time. These 
equations are Neto s : . (64) 
where æ is a column vector of co-ordinates along the 
centroid axes. The direction cosines of the intersections 
of these hyperplanes taken (r — 1) at a time are therefore 
proportional to the elements in the columns of (A’)7', 
and to make them into direction cosines this has to have 
its columns normalized by post-multiplication by a diagonal 
matrix D, giving for the structure on the primary factors 

F(A’) D : a . (65) 

D is also the matrix of correlations between the reference 
vectors and the primary factors, for 

A(A’))7D=D . E . (66) 

Each primary factor is therefore correlated with its own 

reference vector but orthogonal to all the others, as can 


also be easily seen geometrically. 
The matrix of intercorrelations of the primary factors 


is DA7"(.A’)~"D from equation (65). 
If W is the pattern on the primary factors p, so that 


test scores s = Wp 
then the structure on the primary factors is also 
sp’ = Wpp' 
where pp’ is the matrix of correlations between the primary 
factors, and therefore 


primary factor structure = WDA7(A’)*D (67) 
Also, this structure = F(A’)~D from (65). 
Equating these we have : 
WDA =F 
whence W=FAD". 3 . (68) 
= KD} i . (69) 


We have, therefore, 
Structure Paitern 


oA et (70) 


Refereñce vectors : 
+ ECA) AD eee 


Primary factors - 


e 
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where the reference-vector pattern has been entered 
by analogy but could easily be independently founde 
It will be seen that the structure and pattern of the 
primary factors are identical with the pattern and struc- 
ture of the reference vectors except for the diagonal 
matrix D. The structure of the one is the pattern of the 
other multiplied by D. 

This theorem is not confined to the case of simple 
structure, but is more general, and applies to any two sets 
of oblique axes with the same origin O, of which the axes 
of the one set are intersections of “ primes ” taken r — 1 
at a time in the space of r dimensions, and the axes of the 
other set are lines perpendicular to those primes. By 
prime is meant a space of one dimension less than the whole, 
i.e. Thurstone’s-hyperplane. The projections of any point 
P on to the one set of axes are identical with the projections 
thereon of its oblique co-ordinates on the other set, which 
sentence is equivalent to the matrix identities (see 70)— 

; FA =FAD" x D 
and = F(A’)"1D = F(A’)-! x D 

Structure ] _ Pattern on { Cosines to project it 

on one set{ other set | on to the first set. 
A diagram makes this obvious in the two-dimensional case 
and gives the key to the situation. A perspective diagram 
of the three-dimensional case is not very difficult to make 
and is still more illuminating. The vector (or test) OP 
is the “resultant ” of its oblique co-ordinates (the pattern), 
but not of its projections (the structure). It is of interest 
to notice that, either on the reference vectors or on the 
primary factors— 

Pattern x Transpose of Structure = Test-correlations. 
This serves as a useful check on calculations. It is geo- 
metrically immediately obvious. For consider a space 
defined by n oblique axes, with origin O, and any two 
points P and Q each at unit distance from O. The direc- 
tions OP and OQ may be taken as vectors corresponding 
to two tests, and cos POQ to the test correlation. 

Consider the pattern, on these axes, of OP, and the 
structure, on the same axes, of OQ. The former is com- 
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posed of the oblique co-ordinates of the point P, the latter 

eof the projections on the axes of the point Q, which pro- 
jections (OQ being unity) are cosines. Then the inner 
product of those oblique co-ordinates of P with these cosines 
obviously adds up to the projection of OP on OQ, that is 
to cos POQ, or the correlation coefficient. 

In estimating oblique factors by regression, since the 
correlations between factors and tests must be used, the 
relevant equation is 

fo = {F(A DY Ro a oÀ) 
Ledermann’s short cut (section 10a above) requires consider- 
able modification for oblique factors. We no longer have 
R=MM,+M™M;? . ; . (10) 
but 
Pattern x transpose of structure + M,? = 
i.e. in Thurstone’s notation 
(FyAD“){F(A')'D} + F =R + (70.2) 
and using this (Thomson, 1949), we reach the equation 
fo = (I + I F(A’) DYF . (70.8) 
where now 
J ={F (A) DYF (FAD). . (70.4) 
in place of Ledermann’s J = M)'M,~*M). 

Only reciprocals of matrices of order equal to the 
number of common factors are now required, but the 
calculation, like all concerning oblique factors, is still one 
of considerable labour. 

19a. Second-order factors——The above primary factors 
can themselves in their turn be factorized into one, two, or 
more second-order factors, and a factor-specific for each 
primary. If the rank of the matrix of intercorrelations 
of the primaries can be reduced by diagonal entries to say 
two, then the r primaries will be replaced by 7 + 2 second- 
order factors which will no longer be in the original 
common-factor space. The correlations of the primaries 
with these second-order factors will form an oblong matrix 
with its first two columns filled, but each succeeding 
column Will have only one entry corresponding to a factor- 
specific, thus : 
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where subscripts must be supplied to indicate the primary 
(the row) and the second-order factor (the column). 

The primary factors can be thought of as added to the 
actual tests, their direction cosines being added as rows 
below F, which thus becomes : 

TA 
DA= 
Imagine this matrix post-multiplied by a rotating matrix 
Y, with r rows and r + 2 columns, which will give the 
correlations with the r- 2 second-order factors. The 
lower part of the resulting matrix will be Æ, which we 
already know. That is— 
DAY =E 4 f TAI) 
ADT EL. : sy (2) 
and the correlations of the original tests with the second- 
order factors are then : 
G = FY = FAD“E = VDE . B) 
G is both a structure and a pattern, with continuous 
columns equal in number to the general second-order 
factors, followed by a number of columns equal to the 
number of primaries, this second part forming an orthog- 
onal simple structure. 

20. Boundary conditions.—These refer to the conditions 
under which a matrix of correlation coefficients can be 
explained by orthogonal factors which run each through 
only a given number of tests. The problem was first 
raised by Thomson (1919b) and a beginning made with 
its solution (J. R. Thompson, Appendix to Thomson’s 
paper). Various papers by J. R. Thompson culminated 
in that of 1929, and sce also Black (1929). Thomson 
returned to the problem in connexion with rotations in the 
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common-factor space (Thomson, 1936b), and Ledermann 
“gave rigorous proofs of the theorems enunciated by 
Thomson and Thompson and extended them (Ledermann, 
1936). A necessary condition is that if the largest latent 
root of the matrix of correlations exceeds the integer s, 
then factors which run through s tests only and have zero 
loadings in the other tests are certainly inadequate. This 
rule has not been proved to be sufficient, and when applied 
to the common-faetor space only it is certainly not suf- 
ficient, though it seems to be a good guide. Ledermann 
(1936, 170-4) has given a stringent condition as follows. 
If we define the nullity of a square matrix as order minus 
rank, then if it is to be possible to factorize orthogonally a 
matrix of R rank r in such a way that the matrix of load- 
ings contains at least r zeros in each of its columns, the 
sum of the nullities of all the r-rowed principal minors of 
R must at least be equal to r. 

21. The sampling of bonds.—The root idea is that of the 
complete family of variates that can be made by all possible 
additive combinations of bonds from a given pool, and 
the complete family of correlation coefficients between 
pairs of these. Thomson (1927) mooted the idea and 
worked out the example quoted in Chapter XX. He 
had earlier (1927a) showed that with all-or-none bonds the 
most probable value of a correlation coefficient is V (PP); 
where the p’s are fractions of the whole pool forming the 
variates, and the most probable value of a tetrad-difference 
F, zero. Mackie (1928a) showed that the mean tetrad- 
difference is zero, and its variance, for Fy — 


ap = 


1 
Wo { rir + pops + PaPa + PPs — 2(PrP2Ps 
+ PrP2Ps + PrPsPs + P2PsPs) + APP2PsPa 
4 2 = 9) 1 pDl — pa — pal — pa) 
(N — 1)! 1 2 P3 Pa 
where N is the number of bonds in the whole pool. He 
found for the mean value of 72 the value +/(p,p), and for 


its variance— ‘ 
ý (1 — pi — pe) 


2 


or, IN OL. 
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This is not the variance of all possible correlation 


coefficients, but of those formed by taking fractions p, and’ 


Pa of the pool. The whole family of correlation coefficients 
will be widely scattered by reason of the different values 
of p, “rich” tests having high correlations, and those 
with low p, low correlations. Mackie (1929) next extended 
these formule to variable coefficients (i.e. bonds which no 
longer were all-or-none). He again found the mean value 
of F to.be zero, and for its variance— 
4(N — 1X N — 2) [2 oyla oN E 

EE aa aan 


The presence of = in this is due to Mackie’s limitation to 


positive loadings of the bonds. Thomson (1935), 72) 
removed this limitation and found— 


Similarly, Mackie found for variable positive loadings 


(1929)— 
aar) 


and for all loadings Thomson found (1935b)— 
L 


o2 = 


7 "N 
Thomson suggested without proof that in general, when 


limits are set to the variability of the loadings of the bonds, “ 


resulting in a family of correlation coefficients averaging 7, 
these correlations will form a distribution with variance— 


me! = 
a? =p" p~ TA) 
and will give tetrad-differences averaging zero with a 


variance— 


4(N — 1)(N — 2) f_ =| 2 =e 
(oy = ae ie Pei? ny} Æ 2(N Da E 72)? 
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Summing up, Thomson says (1935), 77-8): “ The sam- 
spling principle taken alone gives correlations of all values 
. and zero tetrad-differences if N be large. Fitting the 
sampled elements with weights .. . if the weights may 
be any weights . . . destroys correlation when N is infinite. 
This means that on the Sampling Theory a certain approxi- 
mation to ‘all-or-none-ness’ is a necessary assumption 
—not to explain zero tetrad-differences, but to explain 
the existence of correlations of . . . large size.... The 
most important point in all this appears to me to be the 
fact that on all these hypotheses the tetrad-differences tend to 
vanish. This tendency appears to be a natural one among 
correlation coefficients.” 

A tendency for tetrad-differences to vanish means, of 
course, a still stronger tendency for large minors of the 
correlational matrix to vanish. In more general terms, 
therefore, Thomson’s theorem is that in a complete family 
of correlation coefficients the rank of the correlation matrix 
tends towards unity, and that a random sample of variates 
from this family will (in less strong measure) show the 
same tendency. 
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